Machine Learning Small Molecule Properties in Drug Discovery
- URL: http://arxiv.org/abs/2308.12354v1
- Date: Wed, 2 Aug 2023 22:18:41 GMT
- Title: Machine Learning Small Molecule Properties in Drug Discovery
- Authors: Nikolai Schapin, Maciej Majewski, Alejandro Varela, Carlos Arroniz,
Gianni De Fabritiis
- Abstract summary: We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
- Score: 44.62264781248437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) is a promising approach for predicting small molecule
properties in drug discovery. Here, we provide a comprehensive overview of
various ML methods introduced for this purpose in recent years. We review a
wide range of properties, including binding affinities, solubility, and ADMET
(Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss
existing popular datasets and molecular descriptors and embeddings, such as
chemical fingerprints and graph-based neural networks. We highlight also
challenges of predicting and optimizing multiple properties during hit-to-lead
and lead optimization stages of drug discovery and explore briefly possible
multi-objective optimization techniques that can be used to balance diverse
properties while optimizing lead candidates. Finally, techniques to provide an
understanding of model predictions, especially for critical decision-making in
drug discovery are assessed. Overall, this review provides insights into the
landscape of ML models for small molecule property predictions in drug
discovery. So far, there are multiple diverse approaches, but their
performances are often comparable. Neural networks, while more flexible, do not
always outperform simpler models. This shows that the availability of
high-quality training data remains crucial for training accurate models and
there is a need for standardized benchmarks, additional performance metrics,
and best practices to enable richer comparisons between the different
techniques and models that can shed a better light on the differences between
the many techniques.
Related papers
- Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling [38.53065398127086]
We evaluate the impact of pretraining Graph Transformer architectures on atom-level quantum-mechanical features.
We find that models pretrained on atomic quantum mechanical properties capture more low-frequency laplacian eigenmodes.
arXiv Detail & Related papers (2024-10-10T15:20:30Z) - Objective-Agnostic Enhancement of Molecule Properties via Multi-Stage
VAE [1.3597551064547502]
Variational autoencoder (VAE) is a popular method for drug discovery and various architectures and pipelines have been proposed to improve its performance.
VAE approaches are known to suffer from poor manifold recovery when the data lie on a low-dimensional manifold embedded in a higher dimensional ambient space.
In this paper, we explore applying a multi-stage VAE approach, that can improve manifold recovery on a synthetic dataset, to the field of drug discovery.
arXiv Detail & Related papers (2023-08-24T20:22:22Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Accurate, reliable and interpretable solubility prediction of druglike
molecules with attention pooling and Bayesian learning [1.8275108630751844]
In silico prediction of solubility has been studied for its utility in virtual screening and lead optimization.
Recently, machine learning (ML) methods using experimental data has been popular because physics-based methods are not suitable for high- throughput tasks.
In this paper, we develop graph neural networks (GNNs) with the self-attention readout layer to improve prediction performance.
arXiv Detail & Related papers (2022-09-29T07:48:10Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Few-Shot Graph Learning for Molecular Property Prediction [46.60746023179724]
We propose Meta-MGNN, a novel model for few-shot molecular property prediction.
To exploit unlabeled molecular information, Meta-MGNN further incorporates molecular structure, attribute based self-supervised modules and self-attentive task weights.
Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods.
arXiv Detail & Related papers (2021-02-16T01:55:34Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Predicting drug properties with parameter-free machine learning:
Pareto-Optimal Embedded Modeling (POEM) [0.13854111346209866]
We describe a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization.
We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.
arXiv Detail & Related papers (2020-02-11T17:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.