An AI System Just Published Research in Nature After Passing Peer Review

Sakana AI's 'AI Scientist' generated a paper that scored higher than 55% of human submissions at ICLR 2025

Scientist working in a laboratory with equipment

A paper written entirely by an AI system has been published in Nature, marking the first time machine-generated research passed rigorous human peer review without any human editing. The system, called AI Scientist, was developed by researchers at Sakana AI, the University of British Columbia, the Vector Institute, and the University of Oxford.

The achievement raises uncomfortable questions about the future of scientific publishing and whether automated research accelerates discovery or buries it in machine-generated mediocrity.

What the AI Actually Did

The AI Scientist operates as an end-to-end research pipeline. Given a broad topic like “study something interesting about how AI learns,” the system generates hypotheses, searches scientific literature, designs experiments, runs code, analyzes results, creates figures, and writes complete papers in LaTeX format.

To test the system, the team submitted three AI-generated papers to the “I Can’t Believe It’s Not Better” (ICBINB) workshop at ICLR 2025. One paper was accepted with an average score of 6.33 from three reviewers (individual scores: 6, 7, and 6). That score exceeded the workshop’s acceptance threshold and ranked higher than 55% of human-authored submissions.

The entire process took about 15 hours and cost approximately $140 in compute.

The team had determined in advance that they would withdraw any accepted papers before publication. They did exactly that, treating the submission as an experiment rather than a genuine contribution to the field.

The Quality Problem

The AI Scientist generates passable work. Not great work.

Experts reviewing the output described the papers as “mediocre” and “okay but not great.” The system produces creative ideas but struggles with execution. Common problems include hallucinated references, duplicated figures, shallow methodological rigor, and claims that lack real novelty.

Jeff Clune, the lead researcher, acknowledges these limitations openly. The current version occasionally produces naive or underdeveloped ideas. It can write code that runs but doesn’t always implement sound research methodology. It hallucinates citations to papers that don’t exist.

One reviewer noted that while some ideas showed promise, the papers lacked “any real novelty” in their actual contributions.

Why This Matters

The workshop that accepted the AI paper has a 70% acceptance rate, lower than flagship venues. Passing peer review at a workshop is different from publishing in a selective journal. But the gap is closing.

The researchers found that paper quality scales with model capability. Better foundation models produce better research. As Claude, GPT, and their successors improve, so will automated research output.

This creates a potential flooding problem. If AI can generate plausible papers at scale for $140 each, peer review faces an existential challenge. Reviewers already struggle with submission volume. Adding machine-generated papers - even ones that technically meet quality thresholds - could overwhelm the system.

The team built an automated reviewer into the AI Scientist pipeline. It achieved 69% balanced accuracy in predicting accept/reject decisions, exceeding the consistency rate human reviewers demonstrated in NeurIPS 2021’s consistency experiment. AI reviewing AI papers is no longer hypothetical.

The Ethics They Actually Addressed

Sakana AI took unusual steps to handle the obvious ethical concerns. They watermarked all AI-generated papers for transparency. They obtained IRB approval before running experiments. They proactively withdrew accepted submissions rather than claiming publication credit.

They’re also advocating for community norms around AI-generated research, pushing for disclosure requirements and guidelines for how such work should be evaluated.

But these are voluntary measures from researchers who chose to act responsibly. The system is open source. Others may not exercise the same restraint.

What Comes Next

The AI Scientist currently works only for computational experiments where code can validate results. It can’t run lab experiments, collect human data, or verify physical claims. That limits its scope but doesn’t make it irrelevant. Computational research spans machine learning, computational biology, materials science, and beyond.

The researchers are explicit about their goal: fully automated scientific discovery. The Nature paper is the first milestone. Future versions will handle more complex research, work across more domains, and presumably produce higher-quality output.

Whether that accelerates scientific progress or generates noise that drowns out real discoveries remains an open question. The AI Scientist can write papers. Whether it can do science is a different matter entirely.