Predicting drug properties with parameter-free machine learning:
Pareto-Optimal Embedded Modeling (POEM)
- URL: http://arxiv.org/abs/2002.04555v2
- Date: Thu, 2 Apr 2020 19:13:45 GMT
- Title: Predicting drug properties with parameter-free machine learning:
Pareto-Optimal Embedded Modeling (POEM)
- Authors: Andrew E. Brereton, Stephen MacKinnon, Zhaleh Safikhani, Shawn Reeves,
Sana Alwash, Vijay Shahani, Andreas Windemuth
- Abstract summary: We describe a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization.
We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.
- Score: 0.13854111346209866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The prediction of absorption, distribution, metabolism, excretion, and
toxicity (ADMET) of small molecules from their molecular structure is a central
problem in medicinal chemistry with great practical importance in drug
discovery. Creating predictive models conventionally requires substantial
trial-and-error for the selection of molecular representations, machine
learning (ML) algorithms, and hyperparameter tuning. A generally applicable
method that performs well on all datasets without tuning would be of great
value but is currently lacking. Here, we describe Pareto-Optimal Embedded
Modeling (POEM), a similarity-based method for predicting molecular properties.
POEM is a non-parametric, supervised ML algorithm developed to generate
reliable predictive models without need for optimization. POEMs predictive
strength is obtained by combining multiple different representations of
molecular structures in a context-specific manner, while maintaining low
dimensionality. We benchmark POEM relative to industry-standard ML algorithms
and published results across 17 classifications tasks. POEM performs well in
all cases and reduces the risk of overfitting.
Related papers
- Balancing Molecular Information and Empirical Data in the Prediction of Physico-Chemical Properties [8.649679686652648]
We propose a general method for combining molecular descriptors with representation learning.
The proposed hybrid model exploits chemical structure information using graph neural networks.
It automatically detects cases where structure-based predictions are unreliable, in which case it corrects them by representation-learning based predictions.
arXiv Detail & Related papers (2024-06-12T10:51:00Z) - A Multi-Grained Symmetric Differential Equation Model for Learning
Protein-Ligand Binding Dynamics [74.93549765488103]
In drug discovery, molecular dynamics simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites.
We propose NeuralMD, the first machine learning surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding.
We show the efficiency and effectiveness of NeuralMD, with a 2000$times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric.
arXiv Detail & Related papers (2024-01-26T09:35:17Z) - SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive
Molecular Property Prediction [1.534667887016089]
We present a novel method for generating molecular fingerprints using multi parameter persistent homology (MPPH)
This technique holds considerable significance for drug discovery and materials science, where precise molecular property prediction is vital.
We demonstrate its superior performance over existing state-of-the-art methods in predicting molecular properties through extensive evaluations on the MoleculeNet benchmark.
arXiv Detail & Related papers (2023-12-12T09:33:54Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - Multi-objective Molecular Optimization for Opioid Use Disorder Treatment
Using Generative Network Complex [5.33208055504216]
Opioid Use Disorder (OUD) has emerged as a significant global health issue.
In this study, we propose a deep generative model that combines a differential equation (SDE)-based diffusion modeling with the latent space of a pretrained autoencoder model.
The molecular generator enables efficient generation of molecules that are effective on multiple targets.
arXiv Detail & Related papers (2023-06-13T01:12:31Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - GeoMol: Torsional Geometric Generation of Molecular 3D Conformer
Ensembles [60.12186997181117]
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery.
Existing generative models have several drawbacks including lack of modeling important molecular geometry elements.
We propose GeoMol, an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate 3D conformers.
arXiv Detail & Related papers (2021-06-08T14:17:59Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.