Mitigating Molecular Aggregation in Drug Discovery with Predictive
Insights from Explainable AI
- URL: http://arxiv.org/abs/2306.02206v1
- Date: Sat, 3 Jun 2023 22:30:45 GMT
- Title: Mitigating Molecular Aggregation in Drug Discovery with Predictive
Insights from Explainable AI
- Authors: Hunter Sturm, Jonas Teufel, Kaitlin A. Isfeld, Pascal Friederich,
Rebecca L. Davis
- Abstract summary: A lack of understanding of the causes of molecular aggregation introduces difficulty in the development of predictive tools for detecting aggregating molecules.
We present an examination of the molecular features differentiating datasets of aggregating and non-aggregating molecules, as well as a machine learning approach to predicting molecular aggregation.
Our method uses explainable graph neural networks and counterfactuals to reliably predict and explain aggregation, giving additional insights and design rules for future screening.
- Score: 1.0499611180329804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the importance of high-throughput screening (HTS) continues to grow due to
its value in early stage drug discovery and data generation for training
machine learning models, there is a growing need for robust methods for
pre-screening compounds to identify and prevent false-positive hits. Small,
colloidally aggregating molecules are one of the primary sources of
false-positive hits in high-throughput screens, making them an ideal candidate
to target for removal from libraries using predictive pre-screening tools.
However, a lack of understanding of the causes of molecular aggregation
introduces difficulty in the development of predictive tools for detecting
aggregating molecules. Herein, we present an examination of the molecular
features differentiating datasets of aggregating and non-aggregating molecules,
as well as a machine learning approach to predicting molecular aggregation. Our
method uses explainable graph neural networks and counterfactuals to reliably
predict and explain aggregation, giving additional insights and design rules
for future screening. The integration of this method in HTS approaches will
help combat false positives, providing better lead molecules more rapidly and
thus accelerating drug discovery cycles.
Related papers
- Addressing Model Overcomplexity in Drug-Drug Interaction Prediction With Molecular Fingerprints [0.0]
Accurately predicting drug-drug interactions (DDIs) is crucial for pharmaceutical research and clinical safety.<n>Recent deep learning models often suffer from high computational costs and limited generalization across datasets.<n>In this study, we investigate a simpler yet effective approach using molecular representations such as Morgan fingerprints (S), graph-based embeddings from graph convolutional networks (GCNs), and transformer-derived embeddings from MoLFormer integrated into a straightforward neural network.
arXiv Detail & Related papers (2025-03-30T18:27:01Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models.
It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z) - Contextual Representation Anchor Network to Alleviate Selection Bias in Few-Shot Drug Discovery [34.32009652184957]
We present a novel method named contextual representation anchor Network (CRA), where an anchor refers to a cluster center of the representations of molecules.
CRA introduces a dual-augmentation mechanism that includes context augmentation, which dynamically retrieves analogous unlabeled molecules.
We evaluate our approach on the MoleculeNet and FS-Mol benchmarks, as well as in domain transfer experiments.
arXiv Detail & Related papers (2024-10-28T03:54:10Z) - Fragment-Masked Diffusion for Molecular Optimization [71.13210858056527]
We propose a fragment-masked molecular optimization method based on phenotypic drug discovery (PDD)<n>PDD-based molecular optimization can reduce potential safety risks while optimizing phenotypic activity, thereby increasing the likelihood of clinical success.<n>The overall experiments demonstrate that the in-silico optimization success rate reaches 95.4%, with an average efficacy increase of 7.5%.
arXiv Detail & Related papers (2024-08-17T06:00:58Z) - Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction [0.0]
We introduce DIG-Mol, a novel self-supervised graph neural network framework for molecular property prediction.
DIG-Mol integrates a momentum distillation network with two interconnected networks to efficiently improve molecular characterization.
We have established DIG-Mol's state-of-the-art performance through extensive experimental evaluation in a variety of molecular property prediction tasks.
arXiv Detail & Related papers (2024-05-04T10:09:27Z) - Removing Biases from Molecular Representations via Information
Maximization [16.38589836748167]
InfoCORE is an Information approach for COnfounder REmoval to deal with batch effects.
It adaptively reweighs samples to equalize their implied batch distribution.
It offers a versatile framework and resolves general distribution shifts and issues of data fairness.
arXiv Detail & Related papers (2023-12-01T16:53:15Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Associative Learning Mechanism for Drug-Target Interaction Prediction [6.107658437700639]
Drug-target affinity (DTA) represents the strength of drug-target interaction (DTI)
Traditional methods lack the interpretability of the DTA prediction process.
This paper proposes a DTA prediction method with interactive learning and an autoencoder mechanism.
arXiv Detail & Related papers (2022-05-24T14:25:28Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z) - Combating small molecule aggregation with machine learning [0.0]
We present a bespoke machine-learning tool to confidently and intelligibly flag small colloidally aggregating molecules (SCAMs)
Our data demonstrate an unprecedented utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in a challenging out-of-sample validation.
arXiv Detail & Related papers (2021-05-01T14:41:01Z) - MEG: Generating Molecular Counterfactual Explanations for Deep Graph
Networks [11.291571222801027]
We present a novel approach to tackle explainability of deep graph networks in the context of molecule property prediction t asks.
We generate informative counterfactual explanations for a specific prediction under the form of (valid) compounds with high structural similarity and different predicted properties.
We discuss the results showing how the model can convey non-ML experts with key insights into the learning model focus in the neighbourhood of a molecule.
arXiv Detail & Related papers (2021-04-16T12:17:19Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.