CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design
- URL: http://arxiv.org/abs/2512.02033v1
- Date: Thu, 20 Nov 2025 03:38:46 GMT
- Title: CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design
- Authors: Zijun Gao, Mutian He, Shijia Sun, Hanqun Cao, Jingjie Zhang, Zihao Luo, Xiaorui Wang, Xiaojun Yao, Chang-Yu Hsieh, Chunbin Gu, Pheng Ann Heng,
- Abstract summary: We present CODE (Chain of Diffusion Embeddings), a self evaluating metric to quantify topological frustration.<n>We propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives.<n>By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems.
- Score: 46.12506067241116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reliable evaluation of protein structure predictions remains challenging, as metrics like pLDDT capture energetic stability but often miss subtle errors such as atomic clashes or conformational traps reflecting topological frustration within the protein folding energy landscape. We present CODE (Chain of Diffusion Embeddings), a self evaluating metric empirically found to quantify topological frustration directly from the latent diffusion embeddings of the AlphaFold3 series of structure predictors in a fully unsupervised manner. Integrating this with pLDDT, we propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives to improve the reliability of AlphaFold3 and related models. CODE strongly correlates with protein folding rates driven by topological frustration, achieving a correlation of 0.82 compared to pLDDT's 0.33 (a relative improvement of 148\%). CONFIDE significantly enhances the reliability of quality evaluation in molecular glue structure prediction benchmarks, achieving a Spearman correlation of 0.73 with RMSD, compared to pLDDT's correlation of 0.42, a relative improvement of 73.8\%. Beyond quality assessment, our approach applies to diverse drug design tasks, including all-atom binder design, enzymatic active site mapping, mutation induced binding affinity prediction, nucleic acid aptamer screening, and flexible protein modeling. By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems, offering robust and versatile tools to refine structure predictions, advance structural biology, and accelerate drug discovery.
Related papers
- FP-AbDiff: Improving Score-based Antibody Design by Capturing Nonequilibrium Dynamics through the Underlying Fokker-Planck Equation [19.153777175873547]
We introduce FP-AbDiff, the first antibody generator to enforce Fokker-Planck Equation (FPE) physics along the entire generative trajectory.<n>By aligning generative dynamics with physical laws, FP-AbDiff enhances robustness and generalizability, establishing a principled approach for physically faithful and functionally viable antibody design.
arXiv Detail & Related papers (2025-11-05T01:44:37Z) - FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction [5.216915896877018]
FLOWR:root is an equivariant flow-matching model for pocket-aware 3D ligand generation.<n>It supports de novo generation, pharmacophore-conditional sampling, fragment elaboration and affinity prediction.
arXiv Detail & Related papers (2025-10-02T21:38:26Z) - A Geometric Graph-Based Deep Learning Model for Drug-Target Affinity Prediction [0.0]
We introduce DeepGGL, a deep convolutional neural network that integrates residual connections and an attention mechanism within a geometric graph learning framework.<n>By leveraging multiscale weighted colored bipartite subgraphs, DeepGGL effectively captures fine-grained atom-level interactions in protein-ligand complexes across multiple scales.<n>DeepGGL consistently maintained high predictive accuracy, highlighting its adaptability and reliability for binding affinity prediction in structure-based drug discovery.
arXiv Detail & Related papers (2025-09-15T14:06:39Z) - Fitness aligned structural modeling enables scalable virtual screening with AuroBind [56.720030595081845]
We present AuroBind, a scalable virtual screening framework that fine-tunes a custom atomic-level structural model on million-scale chemogenomic data.<n>AuroBind integrates direct preference optimization, self-distillation from high-confidence complexes, and a teacher-student acceleration strategy.<n>In a prospective screen across ten disease-relevant targets, AuroBind achieved experimental hit rates of 7-69%, with top compounds reaching sub-nanomolar to picomolar potency.
arXiv Detail & Related papers (2025-08-04T07:34:48Z) - DISPROTBENCH: A Disorder-Aware, Task-Rich Benchmark for Evaluating Protein Structure Prediction in Realistic Biological Contexts [76.59606029593085]
DisProtBench is a benchmark for evaluating protein structure prediction models (PSPMs) under structural disorder and complex biological conditions.<n>DisProtBench spans three key axes: data complexity, task diversity, and Interpretability.<n>Results reveal significant variability in model robustness under disorder, with low-confidence regions linked to functional prediction failures.
arXiv Detail & Related papers (2025-06-18T23:58:22Z) - Dumpling GNN: Hybrid GNN Enables Better ADC Payload Activity Prediction Based on Chemical Structure [53.76752789814785]
DumplingGNN is a hybrid Graph Neural Network architecture specifically designed for predicting ADC payload activity based on chemical structure.
We evaluate it on a comprehensive ADC payload dataset focusing on DNA Topoisomerase I inhibitors.
It demonstrates exceptional accuracy (91.48%), sensitivity (95.08%), and specificity (97.54%) on our specialized ADC payload dataset.
arXiv Detail & Related papers (2024-09-23T17:11:04Z) - Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design [81.95343363178662]
atoms must maintain a minimum pairwise distance to avoid separation violations.
NucleusDiff models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint.
It reduces violation rate by up to 1000% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design.
arXiv Detail & Related papers (2024-09-16T08:42:46Z) - SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction [3.406882192023597]
Accurate prediction of protein-ligand binding affinity is crucial for drug development.
Traditional methods often fail to accurately model the complex's spatial information.
We propose SPIN, a model that incorporates various inductive biases applicable to this task.
arXiv Detail & Related papers (2024-07-10T08:40:07Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.