We are living in a golden age for computational structural biology. In the past month alone, announcements from DeepMind/EMBL-EBI, the OpenFold Consortium, and NVIDIA have dramatically expanded what is possible in protein structure prediction, co-folding, and even generative binder design. These advances have tremendous potential for drug discovery, but they also create a new challenge: how do discovery teams choose the right model for the right target and modality? (EMBL-EBI, 2026; OpenFold Consortium, 2026; NVIDIA, 2026)
2026 is already a milestone for drug discovery
In mid-March 2026, the AlphaFold Protein Structure Database was expanded with millions of predicted protein complex structures, roughly doubling its size. For the first time, researchers can systematically access interaction models of protein–protein assemblies such as HIV-1 protease in its functional dimeric form. This moves AlphaFold from primarily a monomer predictor to a practical resource for mapping interaction networks and assemblies, which is indispensable for understanding biological function and designing targeted therapeutics. (EMBL-EBI, 2026)
On March 13, the OpenFold Consortium released a major update to OpenFold3 positioning it as a fully open co-folding stack. The update includes training datasets, model weights, training and inference code, and evaluation scripts. The consortium reports performance competitive with AlphaFold3 across most evaluated modalities, and emphasizes that this level of openness enables independent reproduction, rigorous benchmarking, and method extension capabilities that are difficult with closed or inference only systems. (OpenFold Consortium, 2026)
Finally, NVIDIA unveiled Proteina-Complexa, a fully atomistic protein binder design framework that unifies conditional generative modelling and optimization. It reports higher in-silico success rates than prior generative methods, with extensive wet-lab validation: 63.5% hit rates with picomolar affinities for PDGFR binders and 40–50% hit rates for several kinase binder classes. This sets a new state of the art in computational binder design. (NVIDIA, 2026)
Why we need a way to compare models
Each of these systems has strengths: AlphaFold DB for coverage, OpenFold3 for openness and extensibility, Proteina-Complexa for binder generation and complex level metrics. But there is no single “best” or “one size fits all” model. Performance depends on protein length, fold class, disorder content, multimeric state, and downstream task—whether it is stability engineering, binding interface analysis, or de novo binder design.
Without a systematic way to compare models on your specific targets, teams default to a single predictor, missing opportunities to leverage the ecosystem. Recent benchmarks like PDFBench and PDB-Struct highlight that simple metrics such as RMSD or sequence recovery can be misleading; models that excel on one score may underperform on foldability, stability, or functional relevance. This is where agentic benchmarking becomes essential. (Kuang et al., 2025; Wang et al., 2023)
The Sigmatic approach
At Sigmatic Sciences, we have built an agentic pipeline that automates end-to-end model comparison for structural biology workflows. Specialized agents prepare input sequences and reference structures, route jobs across leading predictors (Boltz 2, OpenFold 3, AlphaFold 2 configurations, and now easily extensible to Proteina-Complexa and others), and run them in parallel.
Evaluation agents then compute biologically meaningful metrics—inspired by PDB-Struct’s refoldability (TM score, pLDDT) and PDFBench’s multi-dimensional approach—including global accuracy (RMSD, GDT), local accuracy (per residue lDDT), and confidence calibration (pLDDT vs. true error). Results are aggregated into a single benchmark dashboard with AI-generated natural-language insights that highlight top performers, flag low-confidence targets, and suggest experimental priorities. (Kuang et al., 2025; Wang et al., 2023)
This pipeline turns fragmented tool usage into systematic, scalable model selection. It lowers the barrier to adopting new releases like OpenFold3 and Proteina-Complexa, quantifies trade-offs in accuracy, speed, and cost, and aligns evaluation with lab priorities: will this structure support downstream design? The result is faster, more informed decisions that help discovery teams capture the full potential of this golden age. When every new model release brings both opportunity and complexity, having a reproducible benchmark on your own targets is no longer optional—it is a competitive advantage.
References
- EMBL-EBI, Google DeepMind, NVIDIA, & Seoul National University. (2026, March 17). Millions of AI-predicted structures added to AlphaFold Database. Instruct-ERIC. https://instruct-eric.org/news/millions-of-ai-predicted-structures-added-to-alphafold-database-/
- OpenFold Consortium. (2026, March 13). OpenFold Consortium announces major OpenFold3 update and public release of training data for reproducible biomolecular AI. Business Wire. https://www.businesswire.com/news/home/20260313170622/en/OpenFold-Consortium-Announces-Major-OpenFold3-Update-and-Public-Release-of-Training-Data-for-Reproducible-Biomolecular-AI
- NVIDIA. (2026). Proteina-Complexa: Fully atomistic protein binder design. NVIDIA Research. https://research.nvidia.com/labs/genair/proteina-complexa/
- Kuang, J., Liu, N., Sun, C., Ji, T., & Wu, Y. (2025). PDFBench: A benchmark for de novo protein design from function. arXiv. https://arxiv.org/abs/2505.20346
- Wang, C., Zhong, B., Zhang, Z., Chaudhary, N., Misra, S., & Tang, J. (2023). A comprehensive benchmark for structure-based protein design. arXiv. https://arxiv.org/abs/2312.00080
