Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery
- URL: http://arxiv.org/abs/2508.01799v1
- Date: Sun, 03 Aug 2025 15:25:42 GMT
- Title: Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery
- Authors: Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Ng Nga Chun, Gerald W. Y. Cheng, Zongxi Li, Jing Cai, Liang-ting Lin, Jung Sun Yoo,
- Abstract summary: We introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent conditions as augmented input.<n>The training process integrates molecular reconstruction to capture local geometry, interatomic distance prediction, and contrastive learning to build solvent-invariant molecular representations.<n>The framework supports solvent-aware, multi-task modeling and produces consistent results across benchmarks.
- Score: 3.229836697956724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate prediction of protein-ligand interactions is essential for computer-aided drug discovery. However, existing methods often fail to capture solvent-dependent conformational changes and lack the ability to jointly learn multiple related tasks. To address these limitations, we introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent conditions as augmented input. This design enables the model to learn both structural flexibility and environmental context in a unified manner. The training process integrates molecular reconstruction to capture local geometry, interatomic distance prediction to model spatial relationships, and contrastive learning to build solvent-invariant molecular representations. Together, these components lead to significant improvements, including a 3.7% gain in binding affinity prediction, an 82% success rate on the PoseBusters Astex docking benchmarks, and an area under the curve of 97.1% in virtual screening. The framework supports solvent-aware, multi-task modeling and produces consistent results across benchmarks. A case study further demonstrates sub-angstrom docking accuracy with a root-mean-square deviation of 0.157 angstroms, offering atomic-level insight into binding mechanisms and advancing structure-based drug design.
Related papers
- Model Accuracy and Data Heterogeneity Shape Uncertainty Quantification in Machine Learning Interatomic Potentials [5.955636672018519]
Machine learning interatomic potentials (MLIPs) enable accurate atomistic modelling, but reliable uncertainty quantification (UQ) remains elusive.<n>In this study, we investigate two UQ strategies, ensemble learning and D-optimality, within the atomic cluster expansion framework.
arXiv Detail & Related papers (2025-08-05T12:52:49Z) - Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model for Antibody Engineering [9.789817970737666]
A pivotal area of research in antibody engineering is to find effective modifications that enhance antibody-antigen binding affinity.<n>Deep learning solutions offer an alternative by modeling antibody structures to predict binding affinity changes.<n>We propose ProtAttBA, a deep learning model that predicts binding affinity changes based solely on the sequence information of antibody-antigen complexes.
arXiv Detail & Related papers (2025-05-14T15:00:46Z) - Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design [81.95343363178662]
atoms must maintain a minimum pairwise distance to avoid separation violations.
NucleusDiff models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint.
It reduces violation rate by up to 1000% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design.
arXiv Detail & Related papers (2024-09-16T08:42:46Z) - YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention [9.018408514318631]
Traditional methods often miss complex molecular structures, leading to inaccuracies.
We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks.
YZS-Model achieved an $R2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models.
arXiv Detail & Related papers (2024-06-27T12:40:29Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities.
It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset.
In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z) - Extracting Chemical-Protein Interactions via Calibrated Deep Neural
Network and Self-training [0.8376091455761261]
"calibration" techniques have been applied to deep learning models to estimate the data uncertainty and improve the reliability.
In this study, to extract chemical--protein interactions, we propose a DNN-based approach incorporating uncertainty information and calibration techniques.
Our approach has achieved state-of-the-art performance with regard to the Biocreative VI ChemProt task, while preserving higher calibration abilities than those of previous approaches.
arXiv Detail & Related papers (2020-11-04T10:14:31Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.