Most AI in healthcare faces a fundamental tension: the models get better with more data, but sharing patient records between hospitals creates privacy nightmares and regulatory headaches. The Cancer AI Alliance thinks it has found a way around this - and it’s now testing the approach with real research projects across four major cancer centers.
The idea is called federated learning. Instead of moving patient data to a central location for training, the AI models travel to the data. Each hospital trains on its own records, shares only what the model learned, and the insights get combined without any individual patient’s information ever leaving their care provider’s servers.
After a year of building the infrastructure, the Alliance launched eight pilot projects in March 2026 using de-identified clinical data from Fred Hutch, Dana-Farber, Memorial Sloan Kettering, and Johns Hopkins.
How It Actually Works
The technical architecture is surprisingly elegant. Each participating cancer center maintains its own secure data environment - what the Alliance calls an “edge node.” Patient records stay behind each institution’s firewall, subject to the same HIPAA protections and IRB approvals they always were.
When researchers want to train a model, the process works like this: the AI is sent to each hospital, trains on 70% of that center’s de-identified records, and produces a set of model weights - essentially a summary of what it learned. Those weights, not the underlying data, get sent to a central location where they’re merged with learnings from other centers. The improved model then returns to each hospital for testing on the remaining 30% of records.
“With each individual cancer center, their patient data never leaves their edge node,” explained one Alliance researcher. “It’s a way to increase security, reduce the likelihood of privacy leaks, while still being able to answer questions across the alliance.”
The result is a model that has learned from over 1 million patients across four institutions - far more than any single hospital could provide - without a single patient record being transmitted.
The First Projects
Fred Hutch is leading two of the eight initial research efforts.
The first tackles bone metastasis prediction. When cancer spreads to bones, patients often need radiation to prevent fractures and other complications. The problem is identifying who needs early intervention before symptoms appear. “We need to triage,” said Clemens Grassberger of Fred Hutch’s Department of Radiation Oncology. “It’s a classic prediction problem.” He aims to have a working model within months.
The second project tests Asta DataVoyager, a tool developed by the Allen Institute for AI (Ai2) that lets researchers query data using plain English. Simone Dekker, a hematology-oncology fellow at Fred Hutch, is using it to analyze records from approximately 30,000 non-small cell lung cancer patients - the most common cause of cancer deaths.
“It understands what I’m asking,” Dekker said of the tool. “Done in minutes, which is very exciting.”
DataVoyager translates natural-language research questions into executable code, runs the analysis, then returns clearly cited, explainable answers. Researchers can ask follow-up questions and refine their approach without needing to write code themselves. Importantly for privacy, uploaded datasets are automatically deleted after seven days and are never used for model training.
Why This Matters for Rare Cancers
The real promise of federated learning isn’t just about privacy - it’s about what becomes possible when you can effectively combine data across institutions.
Individual cancer centers rarely see enough patients with rare cancer types to draw statistically meaningful conclusions. Even major research hospitals might only encounter a few dozen cases of certain rare cancers per year. But if you can train models across multiple centers without sharing data, those small cohorts suddenly add up.
“The ambition is to grow the platform much beyond just the four founding centers,” one researcher explained, “such that you can start to look at those more rare cases, not as individuals but as smaller cohorts that you just otherwise would not be able to have access to when you are just Fred Hutch or just Johns Hopkins or just Memorial Sloan Kettering.”
The Alliance says its federated approach can accelerate discovery timelines “from years to months” by revealing trends across more diverse populations.
$65 Million in Tech Backing
The Cancer AI Alliance isn’t running on grants alone. AWS, Google, Microsoft, NVIDIA, Deloitte, Allen Institute for AI, and Slalom have collectively committed $65 million in funding and technical resources since the Alliance was founded in October 2024.
AWS alone has contributed $10 million. The tech giants aren’t just writing checks - they’re providing cloud infrastructure, AI expertise, and engineering support to build and maintain the federated learning platform.
For the technology companies, cancer research offers a sympathetic use case for AI that doesn’t generate the backlash of military applications or surveillance tools. For the cancer centers, it means access to resources that would otherwise be out of reach.
The Privacy Alternative
Traditional medical research requiring data from multiple institutions faces substantial hurdles. HIPAA compliance, institutional review boards, data use agreements, de-identification requirements - each step adds months or years to research timelines. Even when data sharing is approved, the process of standardizing records from different electronic health record systems often introduces errors and data loss.
Centralized data processing also creates concentrated risk. A single breach could expose records from multiple institutions. And once data leaves an institution’s control, it’s harder to ensure ongoing compliance with evolving privacy regulations.
Federated learning sidesteps most of these problems. Data stays local, under institutional control, subject to the same protections it always had. Only model updates traverse the network - and those updates, properly designed, shouldn’t reveal information about individual patients.
Studies suggest federated approaches can improve model performance by 15-25% compared to training on single-institution data, while maintaining stronger privacy guarantees than centralized alternatives.
What Comes Next
Over the next year, the Cancer AI Alliance plans to launch dozens more research projects and expand beyond its four founding members. More cancer centers means more diverse patient populations, more rare cancer cases reaching statistical significance, and more validation that models work across different healthcare systems.
The initial projects are focusing on predicting treatment response, identifying novel biomarkers, and analyzing rare cancer trends. If successful, the Alliance believes its federated approach could become a template for privacy-preserving medical AI research beyond oncology.
“You have this little research team working in a circle to produce results,” said Steve Salerno, a biostatistics postdoctoral researcher at Fred Hutch. The circles just happen to span multiple states and millions of patient records - without any of those records ever leaving home.
The Bottom Line
The Cancer AI Alliance is demonstrating that the choice between AI capability and patient privacy may be a false one. Federated learning lets researchers benefit from massive datasets without the privacy risks of centralized data collection. If the pilot projects succeed, this approach could reshape how medical AI research is conducted - proving that the most powerful models don’t require the most invasive data practices.