Molecular Fingerprints for Robust and Efficient ML-Driven Molecular
Generation
- URL: http://arxiv.org/abs/2211.09086v1
- Date: Wed, 16 Nov 2022 18:07:43 GMT
- Title: Molecular Fingerprints for Robust and Efficient ML-Driven Molecular
Generation
- Authors: Ruslan N. Tazhigulov, Joshua Schiller, Jacob Oppenheim, Max Winston
- Abstract summary: We propose a novel molecular fingerprint-based variational autoencoder applied for molecular generation on real-world drug molecules.
We observe a substantial improvement in chemical synthetic accessibility ($DeltabarSAS$ = -0.83) and in computational efficiency up to 5.9x in comparison to an existing state-of-the-art SMILES-based architecture.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a novel molecular fingerprint-based variational autoencoder
applied for molecular generation on real-world drug molecules. We define more
suitable and pharma-relevant baseline metrics and tests, focusing on the
generation of diverse, drug-like, novel small molecules and scaffolds. When we
apply these molecular generation metrics to our novel model, we observe a
substantial improvement in chemical synthetic accessibility
($\Delta\bar{{SAS}}$ = -0.83) and in computational efficiency up to 5.9x in
comparison to an existing state-of-the-art SMILES-based architecture.
Related papers
- MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Expanding Chemical Representation with k-mers and Fragment-based Fingerprints for Molecular Fingerprinting [4.588028371034407]
This study introduces a novel approach, combining substruct counting, $k$-mers, and Daylight-like fingerprints, to expand the representation of chemical structures in SMILES strings.
The integrated method generates comprehensive molecular embeddings that enhance discriminative power and information content.
arXiv Detail & Related papers (2024-03-28T21:36:07Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Improving Molecular Pretraining with Complementary Featurizations [20.86159731100242]
Molecular pretraining is a paradigm to solve a variety of tasks in computational chemistry and drug discovery.
We show that different featurization techniques convey chemical information differently.
We propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO)
arXiv Detail & Related papers (2022-09-29T21:11:09Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - MGCVAE: Multi-objective Inverse Design via Molecular Graph Conditional
Variational Autoencoder [0.0]
This study proposes a molecular graph generative model based on the autoencoder for de novo design.
Results: Among generated molecules, 25.89% optimized molecules were generated in MGCVAE compared to 0.66% in MGVAE.
arXiv Detail & Related papers (2022-02-14T14:33:33Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Pharmacoprint -- a combination of pharmacophore fingerprint and
artificial intelligence as a tool for computer-aided drug design [6.053347262128918]
We propose a high-resolution, pharmacophore fingerprint called Pharmacoprint.
It encodes the presence, types, and relationships between pharmacophore features of a molecule.
arXiv Detail & Related papers (2021-10-04T11:36:39Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.