TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations
- URL: http://arxiv.org/abs/2602.07735v1
- Date: Sun, 08 Feb 2026 00:01:43 GMT
- Title: TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations
- Authors: Matteo Rossi, Ryan Pederson, Miles Wang-Henderson, Ben Kaufman, Edward C. Williams, Carl Underkoffler, Owen Lewis Howell, Adrian Layer, Stephan Thaler, Narbe Mardirossian, John Anthony Parkhill,
- Abstract summary: We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction.<n>It achieves 26-fold faster inference than state-of-the-art methods while improving affinity prediction accuracy by $sim$20%.
- Score: 0.7891868017562221
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction that achieves 26-fold faster inference than state-of-the-art methods while improving affinity prediction accuracy by $\sim$20\%. Current deep learning approaches to structure-based drug design rely on expensive all-atom diffusion to generate 3D coordinates, creating inference bottlenecks that render large-scale compound screening computationally intractable. We challenge this paradigm with a critical hypothesis: full all-atom resolution is unnecessary for accurate small molecule pose and binding affinity prediction. TerraBind tests this hypothesis through a coarse pocket-level representation (protein C$_β$ atoms and ligand heavy atoms only) within a multimodal architecture combining COATI-3 molecular encodings and ESM-2 protein embeddings that learns rich structural representations, which are used in a diffusion-free optimization module for pose generation and a binding affinity likelihood prediction module. On structure prediction benchmarks (FoldBench, PoseBusters, Runs N' Poses), TerraBind matches diffusion-based baselines in ligand pose accuracy. Crucially, TerraBind outperforms Boltz-2 by $\sim$20\% in Pearson correlation for binding affinity prediction on both a public benchmark (CASP16) and a diverse proprietary dataset (18 biochemical/cell assays). We show that the affinity prediction module also provides well-calibrated affinity uncertainty estimates, addressing a critical gap in reliable compound prioritization for drug discovery. Furthermore, this module enables a continual learning framework and a hedged batch selection strategy that, in simulated drug discovery cycles, achieves 6$\times$ greater affinity improvement of selected molecules over greedy-based approaches.
Related papers
- Investigating Knowledge Distillation Through Neural Networks for Protein Binding Affinity Prediction [0.22369578015657954]
Trade-off between predictive accuracy and data availability makes it difficult to predict protein--protein binding affinity accurately.<n>We suggest a regression framework based on knowledge distillation that uses protein structural data during training and only needs sequence data during inference.
arXiv Detail & Related papers (2026-01-07T08:43:08Z) - Pearl: A Foundation Model for Placing Every Atom in the Right Location [52.35027831422145]
We introduce Pearl, a foundation model for protein-ligand cofolding at scale.<n>Pearl establishes a new state-of-the-art performance in protein-ligand cofolding.<n>Pearl surpasses AlphaFold 3 and other open source baselines on the public Runs N' Poses and PoseBusters benchmarks.
arXiv Detail & Related papers (2025-10-28T17:36:51Z) - FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction [5.216915896877018]
FLOWR:root is an equivariant flow-matching model for pocket-aware 3D ligand generation.<n>It supports de novo generation, pharmacophore-conditional sampling, fragment elaboration and affinity prediction.
arXiv Detail & Related papers (2025-10-02T21:38:26Z) - KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction [60.23701115249195]
KEPLA is a novel deep learning framework that integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance.<n> Experiments on two benchmark datasets demonstrate that KEPLA consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T08:02:42Z) - Fast and Accurate Blind Flexible Docking [79.88520988144442]
Molecular docking that predicts the bound structures of small molecules (ligands) to their protein targets plays a vital role in drug discovery.<n>We propose FABFlex, a fast and accurate regression-based multi-task learning model designed for realistic blind flexible docking scenarios.
arXiv Detail & Related papers (2025-02-20T07:31:13Z) - FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction [3.8366697175402225]
FlowDock is the first deep geometric generative model that learns to map unbound (apo) structures to their bound (holo) counterparts.<n>FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures.
arXiv Detail & Related papers (2024-12-14T20:54:37Z) - FABind: Fast and Accurate Protein-Ligand Binding [127.7790493202716]
$mathbfFABind$ is an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding.
Our proposed model demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods.
arXiv Detail & Related papers (2023-10-10T16:39:47Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities.
It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset.
In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z) - ResAtom System: Protein and Ligand Affinity Prediction Model Based on
Deep Learning [1.1493209685387984]
We build a predictive model of protein-ligand affinity through the ResNet neural network with added attention mechanism.
The results show that the use of DeltaVinaRF20 in combination with ResAtom-Score can achieve affinity prediction close to scoring functions.
arXiv Detail & Related papers (2021-04-17T15:37:10Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.