Mitigating Molecular Aggregation in Drug Discovery with Predictive
Insights from Explainable AI
- URL: http://arxiv.org/abs/2306.02206v1
- Date: Sat, 3 Jun 2023 22:30:45 GMT
- Title: Mitigating Molecular Aggregation in Drug Discovery with Predictive
Insights from Explainable AI
- Authors: Hunter Sturm, Jonas Teufel, Kaitlin A. Isfeld, Pascal Friederich,
Rebecca L. Davis
- Abstract summary: A lack of understanding of the causes of molecular aggregation introduces difficulty in the development of predictive tools for detecting aggregating molecules.
We present an examination of the molecular features differentiating datasets of aggregating and non-aggregating molecules, as well as a machine learning approach to predicting molecular aggregation.
Our method uses explainable graph neural networks and counterfactuals to reliably predict and explain aggregation, giving additional insights and design rules for future screening.
- Score: 1.0499611180329804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the importance of high-throughput screening (HTS) continues to grow due to
its value in early stage drug discovery and data generation for training
machine learning models, there is a growing need for robust methods for
pre-screening compounds to identify and prevent false-positive hits. Small,
colloidally aggregating molecules are one of the primary sources of
false-positive hits in high-throughput screens, making them an ideal candidate
to target for removal from libraries using predictive pre-screening tools.
However, a lack of understanding of the causes of molecular aggregation
introduces difficulty in the development of predictive tools for detecting
aggregating molecules. Herein, we present an examination of the molecular
features differentiating datasets of aggregating and non-aggregating molecules,
as well as a machine learning approach to predicting molecular aggregation. Our
method uses explainable graph neural networks and counterfactuals to reliably
predict and explain aggregation, giving additional insights and design rules
for future screening. The integration of this method in HTS approaches will
help combat false positives, providing better lead molecules more rapidly and
thus accelerating drug discovery cycles.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction [0.0]
We introduce DIG-Mol, a novel self-supervised graph neural network framework for molecular property prediction.
DIG-Mol integrates a momentum distillation network with two interconnected networks to efficiently improve molecular characterization.
We have established DIG-Mol's state-of-the-art performance through extensive experimental evaluation in a variety of molecular property prediction tasks.
arXiv Detail & Related papers (2024-05-04T10:09:27Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Associative Learning Mechanism for Drug-Target Interaction Prediction [6.107658437700639]
Drug-target affinity (DTA) represents the strength of drug-target interaction (DTI)
Traditional methods lack the interpretability of the DTA prediction process.
This paper proposes a DTA prediction method with interactive learning and an autoencoder mechanism.
arXiv Detail & Related papers (2022-05-24T14:25:28Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z) - Combating small molecule aggregation with machine learning [0.0]
We present a bespoke machine-learning tool to confidently and intelligibly flag small colloidally aggregating molecules (SCAMs)
Our data demonstrate an unprecedented utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in a challenging out-of-sample validation.
arXiv Detail & Related papers (2021-05-01T14:41:01Z) - MEG: Generating Molecular Counterfactual Explanations for Deep Graph
Networks [11.291571222801027]
We present a novel approach to tackle explainability of deep graph networks in the context of molecule property prediction t asks.
We generate informative counterfactual explanations for a specific prediction under the form of (valid) compounds with high structural similarity and different predicted properties.
We discuss the results showing how the model can convey non-ML experts with key insights into the learning model focus in the neighbourhood of a molecule.
arXiv Detail & Related papers (2021-04-16T12:17:19Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.