Combating small molecule aggregation with machine learning
- URL: http://arxiv.org/abs/2105.00267v1
- Date: Sat, 1 May 2021 14:41:01 GMT
- Title: Combating small molecule aggregation with machine learning
- Authors: Kuan Lee, Ann Yang, Yen-Chu Lin, Daniel Reker, Goncalo J. L. Bernardes
and Tiago Rodrigues
- Abstract summary: We present a bespoke machine-learning tool to confidently and intelligibly flag small colloidally aggregating molecules (SCAMs)
Our data demonstrate an unprecedented utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in a challenging out-of-sample validation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Biological screens are plagued by false positive hits resulting from
aggregation. Thus, methods to triage small colloidally aggregating molecules
(SCAMs) are in high demand. Herein, we disclose a bespoke machine-learning tool
to confidently and intelligibly flag such entities. Our data demonstrate an
unprecedented utility of machine learning for predicting SCAMs, achieving 80%
of correct predictions in a challenging out-of-sample validation. The tool
outperformed a panel of expert chemists, who correctly predicted 61 +/- 7% of
the same test molecules in a Turing-like test. Further, the computational
routine provided insight into molecular features governing aggregation that had
remained hidden to expert intuition. Leveraging our tool, we quantify that up
to 15-20% of ligands in publicly available chemogenomic databases have the high
potential to aggregate at typical screening concentrations, imposing caution in
systems biology and drug design programs. Our approach provides a means to
augment human intuition, mitigate attrition and a pathway to accelerate future
molecular medicine.
Related papers
- MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - TwinBooster: Synergising Large Language Models with Barlow Twins and
Gradient Boosting for Enhanced Molecular Property Prediction [0.0]
In this study, we use a fine-tuned large language model to integrate biological assays based on their textual information.
This architecture uses both assay information and molecular fingerprints to extract the true molecular information.
TwinBooster enables the prediction of properties of unseen bioassays and molecules by providing state-of-the-art zero-shot learning tasks.
arXiv Detail & Related papers (2024-01-09T10:36:20Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Mitigating Molecular Aggregation in Drug Discovery with Predictive
Insights from Explainable AI [1.0499611180329804]
A lack of understanding of the causes of molecular aggregation introduces difficulty in the development of predictive tools for detecting aggregating molecules.
We present an examination of the molecular features differentiating datasets of aggregating and non-aggregating molecules, as well as a machine learning approach to predicting molecular aggregation.
Our method uses explainable graph neural networks and counterfactuals to reliably predict and explain aggregation, giving additional insights and design rules for future screening.
arXiv Detail & Related papers (2023-06-03T22:30:45Z) - InstructBio: A Large-scale Semi-supervised Learning Paradigm for
Biochemical Problems [38.57333125315448]
InstructMol is a semi-supervised learning algorithm to take better advantage of unlabeled examples.
InstructBio substantially improves the generalization ability of molecular models.
arXiv Detail & Related papers (2023-04-08T04:19:22Z) - HD-Bind: Encoding of Molecular Structure with Low Precision,
Hyperdimensional Binary Representations [3.3934198248179026]
Hyperdimensional Computing (HDC) is a proposed learning paradigm that is able to leverage low-precision binary vector arithmetic.
We show that HDC-based inference methods are as much as 90 times more efficient than more complex representative machine learning methods.
arXiv Detail & Related papers (2023-03-27T21:21:46Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.