AI Chatbots Analyze Pregnancy Data Faster Than Human Research Teams

Researchers at UC San Francisco and Wayne State University put eight generative AI systems to the test: analyze pregnancy data from over 1,000 women and build algorithms to predict preterm birth. The result? Four of the eight AI tools produced viable prediction models in minutes that matched or exceeded what human research teams achieved over months.

The findings, published in Cell Reports Medicine on February 17, suggest generative AI could dramatically accelerate medical research - but the study also found that half the AI systems failed entirely.

The Experiment

The researchers gave the AI systems the same challenge that human teams faced in three previous DREAM (Dialogue for Reverse Engineering Assessments and Methods) data challenges: predict preterm birth using vaginal microbiome data and determine pregnancy stage from blood and placental tissue samples.

The original DREAM challenges took human teams nearly two years to complete and publish. The entire AI project - from inception to journal submission - took six months.

“These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines,” said Marina Sirota, PhD, the study’s principal investigator at UCSF.

What Actually Worked

The AI systems generated functioning computer code in minutes - work that would normally take experienced programmers hours or days. But the headline numbers require context.

Only four of eight AI tools produced usable results. The others either failed to generate working code or produced models that didn’t perform adequately. The study doesn’t name the specific AI systems tested.

Perhaps more notable: a UCSF master’s student, Reuben Sarwal, and a high school student from Ann Arbor, Victor Tarca, successfully built viable prediction models with AI assistance. They had limited data science backgrounds but still produced research-quality work.

“Researchers with limited data science backgrounds won’t always need wide collaborations or spend hours debugging code,” said Adi Tarca, PhD, a co-author from Wayne State University. “They can focus on answering the right biomedical questions.”

The Fine Print

The researchers were careful to note what AI can’t do. The systems can produce misleading results - a persistent problem in AI-assisted research. Human expertise remains essential.

The study tested a specific use case: having AI write analysis code for well-defined problems with existing datasets. It doesn’t mean AI can replace the scientific judgment needed to design studies, interpret results, or catch errors.

And that 50% failure rate matters. In a field where reproducibility is already a challenge, adding AI tools that work unpredictably introduces new variables that research teams need to account for.

Still, for researchers working with limited resources or facing the chronic bottleneck of data analysis pipelines, even a 50% success rate with dramatically faster turnaround times could change how medical research gets done.