New Protein Structure AI Outperforms AlphaFold on Complex Proteins

Singapore researchers combine AI with physics simulations to predict protein structures 13% more accurately than existing methods, covering 73% of the human proteome.

A team at the National University of Singapore has built a protein structure prediction tool that beats AlphaFold on complex proteins by blending deep learning with physics-based simulation. The system, called D-I-TASSER, predicted structures roughly 13% more accurately than existing methods in tests and can now model 73% of full protein sequences in the human body.

How It Works

D-I-TASSER takes a different approach than pure end-to-end neural networks. Instead of trying to predict an entire protein structure in one shot, the system breaks complex proteins into smaller domains, predicts each section separately, then uses physics-based modeling to assemble them into a complete three-dimensional structure.

This hybrid method addresses a persistent weakness in pure-AI approaches: multi-domain proteins with long, flexible linker regions that confuse neural networks trained primarily on single-domain examples. By letting physics guide the assembly step, D-I-TASSER handles these challenging cases more reliably.

Professor Zhang Yang, who leads the lab spanning NUS’s Cancer Science Institute, School of Computing, and Yong Loo Lin School of Medicine, explained the practical benefit: “When we can see a protein’s structure more clearly, we can better understand what goes wrong in disease and how potential drugs might interact with it.”

Performance Against AlphaFold

The research, published in Nature Biotechnology, shows D-I-TASSER outperforming both AlphaFold2 and AlphaFold3 on single-domain and multidomain proteins. Large-scale folding experiments demonstrated the system could produce reliable structural models for 81% of protein domains and 73% of full-chain sequences in the human proteome, including many structures that had previously resisted accurate prediction.

That 73% figure matters for drug discovery. Many disease-relevant proteins fall into the “difficult to model” category precisely because they have multiple domains or unusual structural features. Expanding coverage of the human proteome means more potential drug targets become accessible to structure-based drug design.

What This Enables

Accurate protein structures accelerate several stages of pharmaceutical development:

Target identification: Understanding how a disease-linked protein folds reveals potential binding sites for drugs.

Virtual screening: Computational chemists can test millions of potential molecules against a protein structure before synthesizing anything in the lab.

Understanding resistance: When pathogens evolve resistance to drugs, structural analysis shows how mutations alter the binding site.

The Singapore team is already extending D-I-TASSER’s framework to RNA structure prediction and antibody-antigen complex modeling, both areas where accurate structural data remains scarce.

The Bigger Picture

D-I-TASSER represents a broader trend in computational biology: hybrid approaches that combine the pattern-recognition power of neural networks with established physical principles. Pure deep learning systems learn from data but don’t encode fundamental knowledge about how molecules actually behave. Physics-informed methods are slower but more robust when data is limited or structures deviate from training examples.

AlphaFold transformed the field by solving protein structure prediction for most single-domain proteins. But the harder problems, large complexes, flexible proteins, rare structural families, require different tools. D-I-TASSER suggests the next generation of protein modeling will be about integration rather than replacement.

The software and models are being made available to researchers, continuing the pattern of open availability that has characterized major advances in structural biology AI.