CASTELO: Clustered Atom Subtypes aidEd Lead Optimization -- a combined
machine learning and molecular modeling method
- URL: http://arxiv.org/abs/2011.13788v1
- Date: Fri, 27 Nov 2020 15:41:00 GMT
- Title: CASTELO: Clustered Atom Subtypes aidEd Lead Optimization -- a combined
machine learning and molecular modeling method
- Authors: Leili Zhang, Giacomo Domeniconi, Chih-Chieh Yang, Seung-gu Kang,
Ruhong Zhou, Guojing Cong
- Abstract summary: We propose a combined machine learning and molecular modeling approach that automates lead optimization workflow.
Our method provides new hints for drug modification hotspots which can be used to improve drug efficacy.
- Score: 2.8381402107366034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Drug discovery is a multi-stage process that comprises two costly major
steps: pre-clinical research and clinical trials. Among its stages, lead
optimization easily consumes more than half of the pre-clinical budget. We
propose a combined machine learning and molecular modeling approach that
automates lead optimization workflow \textit{in silico}. The initial data
collection is achieved with physics-based molecular dynamics (MD) simulation.
Contact matrices are calculated as the preliminary features extracted from the
simulations. To take advantage of the temporal information from the
simulations, we enhanced contact matrices data with temporal dynamism
representation, which are then modeled with unsupervised convolutional
variational autoencoder (CVAE). Finally, conventional clustering method and
CVAE-based clustering method are compared with metrics to rank the submolecular
structures and propose potential candidates for lead optimization. With no need
for extensive structure-activity relationship database, our method provides new
hints for drug modification hotspots which can be used to improve drug
efficacy. Our workflow can potentially reduce the lead optimization turnaround
time from months/years to days compared with the conventional labor-intensive
process and thus can potentially become a valuable tool for medical
researchers.
Related papers
- Efficient Generation of Molecular Clusters with Dual-Scale Equivariant Flow Matching [5.909830898977327]
We develop a dual-scale flow matching method that separates training and inference into coarse-grained and all-atom stages.
We demonstrate the effectiveness of this method on a dataset of Y6 molecular clusters obtained through MD simulations.
arXiv Detail & Related papers (2024-10-10T02:17:27Z) - Hierarchical Matrix Completion for the Prediction of Properties of Binary Mixtures [3.0478550046333965]
We introduce a novel generic approach for improving data-driven models.
We lump components that behave similarly into chemical classes and model them jointly.
Using clustering leads to significantly improved predictions compared to an MCM without clustering.
arXiv Detail & Related papers (2024-10-08T14:04:30Z) - A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics [73.35846234413611]
In drug discovery, molecular dynamics (MD) simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites.
We propose NeuralMD, the first machine learning (ML) surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding dynamics.
We demonstrate the efficiency and effectiveness of NeuralMD, achieving over 1K$times$ speedup compared to standard numerical MD simulations.
arXiv Detail & Related papers (2024-01-26T09:35:17Z) - Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics Simulation [1.6685829157403116]
This paper presents a methodological framework for training, self-optimising, and self-organising surrogate models.
We show that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately.
arXiv Detail & Related papers (2023-09-22T20:52:50Z) - HD-Bind: Encoding of Molecular Structure with Low Precision,
Hyperdimensional Binary Representations [3.3934198248179026]
Hyperdimensional Computing (HDC) is a proposed learning paradigm that is able to leverage low-precision binary vector arithmetic.
We show that HDC-based inference methods are as much as 90 times more efficient than more complex representative machine learning methods.
arXiv Detail & Related papers (2023-03-27T21:21:46Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches.
For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models.
The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.