Fitness aligned structural modeling enables scalable virtual screening with AuroBind
- URL: http://arxiv.org/abs/2508.02137v1
- Date: Mon, 04 Aug 2025 07:34:48 GMT
- Title: Fitness aligned structural modeling enables scalable virtual screening with AuroBind
- Authors: Zhongyue Zhang, Jiahua Rao, Jie Zhong, Weiqiang Bai, Dongxue Wang, Shaobo Ning, Lifeng Qiao, Sheng Xu, Runze Ma, Will Hua, Jack Xiaoyu Chen, Odin Zhang, Wei Lu, Hanyi Feng, He Yang, Xinchao Shi, Rui Li, Wanli Ouyang, Xinzhu Ma, Jiahao Wang, Jixian Zhang, Jia Duan, Siqi Sun, Jian Zhang, Shuangjia Zheng,
- Abstract summary: We present AuroBind, a scalable virtual screening framework that fine-tunes a custom atomic-level structural model on million-scale chemogenomic data.<n>AuroBind integrates direct preference optimization, self-distillation from high-confidence complexes, and a teacher-student acceleration strategy.<n>In a prospective screen across ten disease-relevant targets, AuroBind achieved experimental hit rates of 7-69%, with top compounds reaching sub-nanomolar to picomolar potency.
- Score: 56.720030595081845
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Most human proteins remain undrugged, over 96% of human proteins remain unexploited by approved therapeutics. While structure-based virtual screening promises to expand the druggable proteome, existing methods lack atomic-level precision and fail to predict binding fitness, limiting translational impact. We present AuroBind, a scalable virtual screening framework that fine-tunes a custom atomic-level structural model on million-scale chemogenomic data. AuroBind integrates direct preference optimization, self-distillation from high-confidence complexes, and a teacher-student acceleration strategy to jointly predict ligand-bound structures and binding fitness. The proposed models outperform state-of-the-art models on structural and functional benchmarks while enabling 100,000-fold faster screening across ultra-large compound libraries. In a prospective screen across ten disease-relevant targets, AuroBind achieved experimental hit rates of 7-69%, with top compounds reaching sub-nanomolar to picomolar potency. For the orphan GPCRs GPR151 and GPR160, AuroBind identified both agonists and antagonists with success rates of 16-30%, and functional assays confirmed GPR160 modulation in liver and prostate cancer models. AuroBind offers a generalizable framework for structure-function learning and high-throughput molecular screening, bridging the gap between structure prediction and therapeutic discovery.
Related papers
- AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model [92.51919604882984]
We introduce AMix-1, a powerful protein foundation model built on Flow Bayesian Networks.<n>AMix-1 is empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm.<n>Building on this foundation, we devise a multiple sequence alignment (MSA)-based in-context learning strategy to unify protein design into a general framework.
arXiv Detail & Related papers (2025-07-11T17:02:25Z) - DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts [76.59606029593085]
DisProtBench is a benchmark for evaluating protein structure prediction models (PSPMs) under structural disorder and complex biological conditions.<n>DisProtBench spans three key axes: data complexity, task diversity, and Interpretability.<n>Results reveal significant variability in model robustness under disorder, with low-confidence regions linked to functional prediction failures.
arXiv Detail & Related papers (2025-06-18T23:58:22Z) - PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments [53.55710514466851]
Protein structure prediction is essential for drug discovery and understanding biological functions.<n>Most folding models rely heavily on multiple sequence alignments (MSAs) to boost prediction performance.<n>We propose PLAME, a novel MSA design model that leverages evolutionary embeddings from pretrained protein language models.
arXiv Detail & Related papers (2025-06-17T04:11:30Z) - AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation [18.8920680373474]
We introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty.<n>We evaluate our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting.
arXiv Detail & Related papers (2025-06-06T05:52:19Z) - GenShin:geometry-enhanced structural graph embodies binding pose can better predicting compound-protein interaction affinity [6.1468096893238915]
We introduce the GenShin model, which constructs a geometry-enhanced structural graph module that extracts additional features from proteins and compounds.<n>It attains an accuracy on par with mainstream models in predicting compound-protein affinities, while eliminating the need for adequate-binding pose as input.<n>Our work will inspire more endeavors to bridge the gap between AI models and practical drug discovery challenges.
arXiv Detail & Related papers (2025-03-16T09:11:56Z) - A general language model for peptide identification [4.044600688588866]
PDeepPP is a unified deep learning framework that integrates pretrained protein language models with a hybrid transformer-convolutional architecture.<n>By enabling large-scale, accurate peptide analysis, PDeepPP supports biomedical research and the discovery of novel therapeutic targets for disease treatment.
arXiv Detail & Related papers (2025-02-21T17:31:22Z) - Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design [81.95343363178662]
atoms must maintain a minimum pairwise distance to avoid separation violations.
NucleusDiff models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint.
It reduces violation rate by up to 1000% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design.
arXiv Detail & Related papers (2024-09-16T08:42:46Z) - ProFSA: Self-supervised Pocket Pretraining via Protein
Fragment-Surroundings Alignment [20.012210194899605]
We propose a novel pocket pretraining approach that leverages knowledge from high-resolution atomic protein structures.
Our method, named ProFSA, achieves state-of-the-art performance across various tasks, including pocket druggability prediction.
Our work opens up a new avenue for mitigating the scarcity of protein-ligand complex data through the utilization of high-quality and diverse protein structure databases.
arXiv Detail & Related papers (2023-10-11T06:36:23Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate
Folding Landscape and Protein Structure Prediction [28.630603355510324]
We present EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets.
By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime.
arXiv Detail & Related papers (2022-08-20T10:23:17Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities.
It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset.
In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.