Related papers: Fragment-based Sequential Translation for Molecular Optimization

Fragment-based Sequential Translation for Molecular Optimization

URL: http://arxiv.org/abs/2111.01009v1
Date: Tue, 26 Oct 2021 21:20:54 GMT
Title: Fragment-based Sequential Translation for Molecular Optimization
Authors: Benson Chen, Xiang Fu, Regina Barzilay, Tommi Jaakkola
Abstract summary: We propose a flexible editing paradigm that generates molecules using learned molecular fragments. We use a variational autoencoder to encode molecular fragments in a coherent latent space. We then utilize as a vocabulary for editing molecules to explore the complex chemical property space.
Score: 23.152338167332374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Searching for novel molecular compounds with desired properties is an important problem in drug discovery. Many existing frameworks generate molecules one atom at a time. We instead propose a flexible editing paradigm that generates molecules using learned molecular fragments--meaningful substructures of molecules. To do so, we train a variational autoencoder (VAE) to encode molecular fragments in a coherent latent space, which we then utilize as a vocabulary for editing molecules to explore the complex chemical property space. Equipped with the learned fragment vocabulary, we propose Fragment-based Sequential Translation (FaST), which learns a reinforcement learning (RL) policy to iteratively translate model-discovered molecules into increasingly novel molecules while satisfying desired properties. Empirical evaluation shows that FaST significantly improves over state-of-the-art methods on benchmark single/multi-objective molecular optimization tasks.

Related papers

Mol-CADiff: Causality-Aware Autoregressive Diffusion for Molecule Generation [13.401822039640297]
Mol-CADiff is a novel diffusion-based framework that uses causal attention mechanisms for text-conditional molecular generation. Our approach explicitly models the causal relationship between textual prompts and molecular structures, overcoming key limitations in existing methods. Our experiments demonstrate that Mol-CADiff outperforms state-of-the-art methods in generating diverse, novel, and chemically valid molecules.
arXiv Detail & Related papers (2025-03-07T15:10:37Z)
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [55.87790704067848]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules. We introduce a module that integrates complementary information from different molecular encoders. Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules.
arXiv Detail & Related papers (2025-02-19T05:49:10Z)
Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [77.50732023411811]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM) TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions. Our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset.
arXiv Detail & Related papers (2024-10-17T14:30:27Z)
FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)<n>FARM is a novel model designed to bridge the gap between SMILES, natural language, and molecular graphs.<n>We evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 11 out of 13 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z)
Navigating Chemical Space with Latent Flows [20.95884505685799]
We propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective optimization tasks under both supervised and unsupervised molecular discovery settings.
arXiv Detail & Related papers (2024-05-07T03:55:57Z)
Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks. In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation. We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
De Novo Molecular Generation via Connection-aware Motif Mining [197.97528902698966]
We propose a new method, MiCaM, to generate molecules based on mined connection-aware motifs. The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information. Based on the mined connection-aware motifs, MiCaM builds a connection-aware generator, which simultaneously picks up motifs and determines how they are connected.
arXiv Detail & Related papers (2023-02-02T14:40:47Z)
Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation. It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES. Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z)
Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules. In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z)
Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization. We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation. We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z)
Goal directed molecule generation using Monte Carlo Tree Search [15.462930062711237]
We propose a novel method, which we call unitMCTS, to perform molecule generation by making a unit change to the molecule at every step using Monte Carlo Tree Search. We show that this method outperforms the recently published techniques on benchmark molecular optimization tasks such as QED and penalized logP.
arXiv Detail & Related papers (2020-10-30T17:49:59Z)
A Deep Generative Model for Fragment-Based Molecule Generation [21.258861822241272]
We develop a language model for small molecular substructures called fragments. In other words, we generate molecules fragment by fragment, instead of atom by atom. We show experimentally that our model largely outperforms other language model-based competitors.
arXiv Detail & Related papers (2020-02-28T15:55:11Z)
Multi-Objective Molecule Generation using Interpretable Substructures [38.637412590671865]
Drug discovery aims to find novel compounds with specified chemical property profiles. The goal is to learn to sample molecules in the intersection of multiple property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales.
arXiv Detail & Related papers (2020-02-08T22:55:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.