PSBench: 1.4 Million Verified Protein Models to Fix AI Drug Discovery's Trust Problem

AlphaFold changed everything about protein structure prediction. But knowing a structure and knowing whether to trust a prediction are different problems. Researchers at the University of Missouri have released PSBench, a database of 1.4 million protein structure models with verified quality assessments, so scientists can finally train AI to know when its own predictions are reliable.

The Trust Gap

AI systems like AlphaFold can predict protein structures with remarkable accuracy, but they don’t perform equally well across all protein types. Some predictions are nearly perfect. Others contain subtle errors that could send drug development efforts in the wrong direction.

The problem: there was no large-scale dataset to train AI systems to assess their own confidence. Researchers had to either trust predictions blindly or spend enormous time manually verifying structures.

PSBench fills that gap by providing 1.4 million protein models, each independently verified by experts.

What’s in the Database

PSBench draws from CASP (Critical Assessment of protein Structure Prediction), the international gold-standard competition that has evaluated computational protein prediction methods since 1994. The database includes models from CASP15 and CASP16 competitions, covering a wide range of protein types and complexity levels.

Each model comes with quality annotations - not just the predicted structure, but how accurate that prediction actually is. This lets AI systems learn the difference between high-confidence and questionable predictions.

The team also developed GATE, a graph transformer-based method trained on CASP15 data. When blindly tested in CASP16 (2024), GATE ranked among the top-performing quality assessment methods, demonstrating the database’s utility.

Why This Matters for Drug Discovery

Proteins perform essential biological functions, and their precise 3D shape determines what they do. Even minor folding errors can cause severe diseases - misfolded proteins drive Alzheimer’s, Parkinson’s, and many cancers.

Drug development increasingly relies on computational screening to identify promising drug candidates before expensive lab work begins. But if the underlying protein structure predictions are wrong, the computational screening is worthless.

By enabling better quality assessment, PSBench helps researchers:

Identify which predictions to trust for drug design
Prioritize which structures need experimental validation
Reduce wasted time pursuing leads based on faulty models
Catch subtle errors that could derail binding affinity simulations

The Team

Jianlin “Jack” Cheng, Curators’ Distinguished Professor at Mizzou’s College of Engineering, led the project with postdoctoral fellow Jian Liu and graduate student Pawan Neupane. They presented PSBench at NeurIPS 2025 in San Diego.

“With PSBench, scientists can develop AI methods to assess the quality of predicted protein models and decide if they can be trusted,” Cheng said.

The Fine Print

PSBench covers protein complexes - multiple proteins interacting together - which are harder to predict than single proteins. The database is strong in this area but doesn’t necessarily address single-protein prediction quality.

The dataset is also static, representing CASP15 and CASP16 submissions. As prediction methods improve, the benchmark will need updates to remain relevant.

Still, PSBench addresses a genuine gap in the field. AI-driven drug discovery moves fast, but only if researchers can distinguish good predictions from bad ones. This database makes that possible at scale.