Related papers: Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)

Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)

URL: http://arxiv.org/abs/2002.04555v2
Date: Thu, 2 Apr 2020 19:13:45 GMT
Title: Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)
Authors: Andrew E. Brereton, Stephen MacKinnon, Zhaleh Safikhani, Shawn Reeves, Sana Alwash, Vijay Shahani, Andreas Windemuth
Abstract summary: We describe a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization. We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.
Score: 0.13854111346209866
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) of small molecules from their molecular structure is a central problem in medicinal chemistry with great practical importance in drug discovery. Creating predictive models conventionally requires substantial trial-and-error for the selection of molecular representations, machine learning (ML) algorithms, and hyperparameter tuning. A generally applicable method that performs well on all datasets without tuning would be of great value but is currently lacking. Here, we describe Pareto-Optimal Embedded Modeling (POEM), a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization. POEMs predictive strength is obtained by combining multiple different representations of molecular structures in a context-specific manner, while maintaining low dimensionality. We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.

Related papers

Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM) TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions. Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z)
Pretraining Graph Transformers with Atom-in-a-Molecule Quantum Properties for Improved ADMET Modeling [38.53065398127086]
We evaluate the impact of pretraining Graph Transformer architectures on atom-level quantum-mechanical features. We find that models pretrained on atomic quantum mechanical properties capture more low-frequency laplacian eigenmodes.
arXiv Detail & Related papers (2024-10-10T15:20:30Z)
YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention [9.018408514318631]
Traditional methods often miss complex molecular structures, leading to inaccuracies. We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks. YZS-Model achieved an $R2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models.
arXiv Detail & Related papers (2024-06-27T12:40:29Z)
Balancing Molecular Information and Empirical Data in the Prediction of Physico-Chemical Properties [8.649679686652648]
We propose a general method for combining molecular descriptors with representation learning. The proposed hybrid model exploits chemical structure information using graph neural networks. It automatically detects cases where structure-based predictions are unreliable, in which case it corrects them by representation-learning based predictions.
arXiv Detail & Related papers (2024-06-12T10:51:00Z)
Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data. Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z)
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles [60.12186997181117]
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery. Existing generative models have several drawbacks including lack of modeling important molecular geometry elements. We propose GeoMol, an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate 3D conformers.
arXiv Detail & Related papers (2021-06-08T14:17:59Z)
MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.