Related papers: Fitness aligned structural modeling enables scalable virtual screening with AuroBind

Fitness aligned structural modeling enables scalable virtual screening with AuroBind

URL: http://arxiv.org/abs/2508.02137v1
Date: Mon, 04 Aug 2025 07:34:48 GMT
Title: Fitness aligned structural modeling enables scalable virtual screening with AuroBind
Authors: Zhongyue Zhang, Jiahua Rao, Jie Zhong, Weiqiang Bai, Dongxue Wang, Shaobo Ning, Lifeng Qiao, Sheng Xu, Runze Ma, Will Hua, Jack Xiaoyu Chen, Odin Zhang, Wei Lu, Hanyi Feng, He Yang, Xinchao Shi, Rui Li, Wanli Ouyang, Xinzhu Ma, Jiahao Wang, Jixian Zhang, Jia Duan, Siqi Sun, Jian Zhang, Shuangjia Zheng,
Abstract summary: We present AuroBind, a scalable virtual screening framework that fine-tunes a custom atomic-level structural model on million-scale chemogenomic data.<n>AuroBind integrates direct preference optimization, self-distillation from high-confidence complexes, and a teacher-student acceleration strategy.<n>In a prospective screen across ten disease-relevant targets, AuroBind achieved experimental hit rates of 7-69%, with top compounds reaching sub-nanomolar to picomolar potency.
Score: 56.720030595081845
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Most human proteins remain undrugged, over 96% of human proteins remain unexploited by approved therapeutics. While structure-based virtual screening promises to expand the druggable proteome, existing methods lack atomic-level precision and fail to predict binding fitness, limiting translational impact. We present AuroBind, a scalable virtual screening framework that fine-tunes a custom atomic-level structural model on million-scale chemogenomic data. AuroBind integrates direct preference optimization, self-distillation from high-confidence complexes, and a teacher-student acceleration strategy to jointly predict ligand-bound structures and binding fitness. The proposed models outperform state-of-the-art models on structural and functional benchmarks while enabling 100,000-fold faster screening across ultra-large compound libraries. In a prospective screen across ten disease-relevant targets, AuroBind achieved experimental hit rates of 7-69%, with top compounds reaching sub-nanomolar to picomolar potency. For the orphan GPCRs GPR151 and GPR160, AuroBind identified both agonists and antagonists with success rates of 16-30%, and functional assays confirmed GPR160 modulation in liver and prostate cancer models. AuroBind offers a generalizable framework for structure-function learning and high-throughput molecular screening, bridging the gap between structure prediction and therapeutic discovery.

Related papers

Investigating Knowledge Distillation Through Neural Networks for Protein Binding Affinity Prediction [0.22369578015657954]
Trade-off between predictive accuracy and data availability makes it difficult to predict protein--protein binding affinity accurately.<n>We suggest a regression framework based on knowledge distillation that uses protein structural data during training and only needs sequence data during inference.
arXiv Detail & Related papers (2026-01-07T08:43:08Z)
CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design [46.12506067241116]
We present CODE (Chain of Diffusion Embeddings), a self evaluating metric to quantify topological frustration.<n>We propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives.<n>By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems.
arXiv Detail & Related papers (2025-11-20T03:38:46Z)
AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model [92.51919604882984]
We introduce AMix-1, a powerful protein foundation model built on Flow Bayesian Networks.<n>AMix-1 is empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm.<n>Building on this foundation, we devise a multiple sequence alignment (MSA)-based in-context learning strategy to unify protein design into a general framework.
arXiv Detail & Related papers (2025-07-11T17:02:25Z)
DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts [76.59606029593085]
DisProtBench is a benchmark for evaluating protein structure prediction models (PSPMs) under structural disorder and complex biological conditions.<n>DisProtBench spans three key axes: data complexity, task diversity, and Interpretability.<n>Results reveal significant variability in model robustness under disorder, with low-confidence regions linked to functional prediction failures.
arXiv Detail & Related papers (2025-06-18T23:58:22Z)
PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments [53.55710514466851]
Protein structure prediction is essential for drug discovery and understanding biological functions.<n>Most folding models rely heavily on multiple sequence alignments (MSAs) to boost prediction performance.<n>We propose PLAME, a novel MSA design model that leverages evolutionary embeddings from pretrained protein language models.
arXiv Detail & Related papers (2025-06-17T04:11:30Z)
AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation [18.8920680373474]
We introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty.<n>We evaluate our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting.
arXiv Detail & Related papers (2025-06-06T05:52:19Z)
GenShin:geometry-enhanced structural graph embodies binding pose can better predicting compound-protein interaction affinity [6.1468096893238915]
We introduce the GenShin model, which constructs a geometry-enhanced structural graph module that extracts additional features from proteins and compounds.<n>It attains an accuracy on par with mainstream models in predicting compound-protein affinities, while eliminating the need for adequate-binding pose as input.<n>Our work will inspire more endeavors to bridge the gap between AI models and practical drug discovery challenges.
arXiv Detail & Related papers (2025-03-16T09:11:56Z)
A general language model for peptide identification [4.044600688588866]
PDeepPP is a unified deep learning framework that integrates pretrained protein language models with a hybrid transformer-convolutional architecture.<n>By enabling large-scale, accurate peptide analysis, PDeepPP supports biomedical research and the discovery of novel therapeutic targets for disease treatment.
arXiv Detail & Related papers (2025-02-21T17:31:22Z)
Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design [81.95343363178662]
atoms must maintain a minimum pairwise distance to avoid separation violations. NucleusDiff models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint. It reduces violation rate by up to 1000% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design.
arXiv Detail & Related papers (2024-09-16T08:42:46Z)
ProFSA: Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment [20.012210194899605]
We propose a novel pocket pretraining approach that leverages knowledge from high-resolution atomic protein structures. Our method, named ProFSA, achieves state-of-the-art performance across various tasks, including pocket druggability prediction. Our work opens up a new avenue for mitigating the scarcity of protein-ligand complex data through the utilization of high-quality and diverse protein structure databases.
arXiv Detail & Related papers (2023-10-11T06:36:23Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate Folding Landscape and Protein Structure Prediction [28.630603355510324]
We present EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime.
arXiv Detail & Related papers (2022-08-20T10:23:17Z)
From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities. It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset. In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.