Fragment-based Sequential Translation for Molecular Optimization
- URL: http://arxiv.org/abs/2111.01009v1
- Date: Tue, 26 Oct 2021 21:20:54 GMT
- Title: Fragment-based Sequential Translation for Molecular Optimization
- Authors: Benson Chen, Xiang Fu, Regina Barzilay, Tommi Jaakkola
- Abstract summary: We propose a flexible editing paradigm that generates molecules using learned molecular fragments.
We use a variational autoencoder to encode molecular fragments in a coherent latent space.
We then utilize as a vocabulary for editing molecules to explore the complex chemical property space.
- Score: 23.152338167332374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Searching for novel molecular compounds with desired properties is an
important problem in drug discovery. Many existing frameworks generate
molecules one atom at a time. We instead propose a flexible editing paradigm
that generates molecules using learned molecular fragments--meaningful
substructures of molecules. To do so, we train a variational autoencoder (VAE)
to encode molecular fragments in a coherent latent space, which we then utilize
as a vocabulary for editing molecules to explore the complex chemical property
space. Equipped with the learned fragment vocabulary, we propose Fragment-based
Sequential Translation (FaST), which learns a reinforcement learning (RL)
policy to iteratively translate model-discovered molecules into increasingly
novel molecules while satisfying desired properties. Empirical evaluation shows
that FaST significantly improves over state-of-the-art methods on benchmark
single/multi-objective molecular optimization tasks.
Related papers
- Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)
TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions.
Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - Navigating Chemical Space with Latent Flows [20.95884505685799]
We propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows.
We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective optimization tasks under both supervised and unsupervised molecular discovery settings.
arXiv Detail & Related papers (2024-05-07T03:55:57Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - De Novo Molecular Generation via Connection-aware Motif Mining [197.97528902698966]
We propose a new method, MiCaM, to generate molecules based on mined connection-aware motifs.
The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information.
Based on the mined connection-aware motifs, MiCaM builds a connection-aware generator, which simultaneously picks up motifs and determines how they are connected.
arXiv Detail & Related papers (2023-02-02T14:40:47Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - Goal directed molecule generation using Monte Carlo Tree Search [15.462930062711237]
We propose a novel method, which we call unitMCTS, to perform molecule generation by making a unit change to the molecule at every step using Monte Carlo Tree Search.
We show that this method outperforms the recently published techniques on benchmark molecular optimization tasks such as QED and penalized logP.
arXiv Detail & Related papers (2020-10-30T17:49:59Z) - A Deep Generative Model for Fragment-Based Molecule Generation [21.258861822241272]
We develop a language model for small molecular substructures called fragments.
In other words, we generate molecules fragment by fragment, instead of atom by atom.
We show experimentally that our model largely outperforms other language model-based competitors.
arXiv Detail & Related papers (2020-02-28T15:55:11Z) - Multi-Objective Molecule Generation using Interpretable Substructures [38.637412590671865]
Drug discovery aims to find novel compounds with specified chemical property profiles.
The goal is to learn to sample molecules in the intersection of multiple property constraints.
We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales.
arXiv Detail & Related papers (2020-02-08T22:55:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.