ByteDance has released Protenix-v1, an open-source protein structure prediction model that outperforms Google DeepMind’s AlphaFold3 on several benchmarks. The model, code, and weights are all available under the Apache 2.0 license - a sharp contrast to AlphaFold3’s restricted access.
Why This Matters
When DeepMind released AlphaFold2 in 2021, it transformed structural biology by making accurate protein structure prediction accessible to researchers worldwide. AlphaFold3, released in 2024, extended these capabilities to predict interactions between proteins, DNA, RNA, and small molecules.
But there was a catch: AlphaFold3’s code and model weights were not released. Researchers could only access it through DeepMind’s server, with usage limits and no ability to modify the model for their specific needs. Drug discovery companies couldn’t train their own versions on proprietary data. Academic labs couldn’t run it locally without restrictions.
Protenix-v1 changes that.
The Benchmarks
The ByteDance research team designed Protenix-v1 to match AlphaFold3’s exact training conditions: the same September 2021 data cutoff, comparable model size (368 million parameters), and equivalent inference budget. This allows for fair comparison.
On protein-protein interactions and antibody-antigen interface prediction, Protenix-v1 actually surpasses AlphaFold3. For protein-ligand and protein-DNA docking tasks, AlphaFold3 maintains an edge.
The team also released PXMeter, an evaluation toolkit with over 6,000 test complexes, so researchers can verify these claims themselves.
What This Means
Protenix-v1 demonstrates inference-time scaling behavior that the team documented thoroughly. On challenging targets like antibody-antigen complexes, increasing the number of sampled candidates from a handful to hundreds produces consistent log-linear accuracy improvements. This means researchers can trade compute for accuracy when they need higher confidence predictions.
For drug discovery, the implications are significant. Companies can now fine-tune an AlphaFold3-level model on their proprietary compound libraries and target proteins. Academic labs can run unlimited predictions without server quotas. The entire pipeline is auditable and modifiable.
The team also released Protenix-v1-20250630, a variant trained on data through June 2025, which they report delivers further improvements for real-world drug discovery applications.
The Fine Print
Open source doesn’t mean simple. Running Protenix-v1 requires substantial compute resources - though far less than training from scratch. The model works best when you can sample multiple candidates and select the best, which adds to resource requirements.
AlphaFold3 retains advantages on certain prediction types, particularly small molecule binding. For production drug discovery workflows, teams will likely need to evaluate both tools on their specific targets.
The training data cutoff matters too. Proteins discovered after September 2021 aren’t represented in the base model, though the variant trained through June 2025 addresses some of this gap.
Still, having a fully open, AlphaFold3-level model available under a permissive license represents a genuine step forward for computational biology. Researchers can now build on this foundation without waiting for access or accepting usage restrictions.