Molecule Generation with Fragment Retrieval Augmentation
- URL: http://arxiv.org/abs/2411.12078v1
- Date: Mon, 18 Nov 2024 21:43:52 GMT
- Title: Molecule Generation with Fragment Retrieval Augmentation
- Authors: Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Arash Vahdat, Weili Nie,
- Abstract summary: Fragment Retrieval-Augmented Generation (f-RAG) is based on a pre-trained molecular generative model that proposes additional fragments to complete and generate a new molecule.
To extrapolate beyond the existing fragments, f-RAG updates the fragment vocabulary with generated fragments via an iterative refinement process.
- Score: 41.95947899013865
- License:
- Abstract: Fragment-based drug discovery, in which molecular fragments are assembled into new molecules with desirable biochemical properties, has achieved great success. However, many fragment-based molecule generation methods show limited exploration beyond the existing fragments in the database as they only reassemble or slightly modify the given ones. To tackle this problem, we propose a new fragment-based molecule generation framework with retrieval augmentation, namely Fragment Retrieval-Augmented Generation (f-RAG). f-RAG is based on a pre-trained molecular generative model that proposes additional fragments from input fragments to complete and generate a new molecule. Given a fragment vocabulary, f-RAG retrieves two types of fragments: (1) hard fragments, which serve as building blocks that will be explicitly included in the newly generated molecule, and (2) soft fragments, which serve as reference to guide the generation of new fragments through a trainable fragment injection module. To extrapolate beyond the existing fragments, f-RAG updates the fragment vocabulary with generated fragments via an iterative refinement process which is further enhanced with post-hoc genetic fragment modification. f-RAG can achieve an improved exploration-exploitation trade-off by maintaining a pool of fragments and expanding it with novel and high-quality fragments through a strong generative prior.
Related papers
- GenMol: A Drug Discovery Generalist with Discrete Diffusion [43.29814519270451]
Generalist Molecular generative model (GenMol) is a versatile framework that addresses various aspects of the drug discovery pipeline.
Under the discrete diffusion framework, we introduce fragment remasking, a strategy that optimize molecules by replacing fragments with masked tokens.
GenMol significantly outperforms the previous GPT-based model trained on SAFE representations in de novo generation and fragment-constrained generation.
arXiv Detail & Related papers (2025-01-10T18:30:05Z) - FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models [54.13671100638092]
We propose a fragment-connected Hierarchical Memory based Large Language Models (LLMs)
We formulate the fragment-level relations in external memory and present several instantiations for different text types.
We validate the benefits of involving these relations on long story understanding, repository-level code generation, and long-term chatting.
arXiv Detail & Related papers (2024-06-05T09:31:37Z) - Drug Discovery with Dynamic Goal-aware Fragments [76.10700304803177]
We propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM)
GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification.
We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules.
arXiv Detail & Related papers (2023-10-02T01:30:42Z) - Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration [63.23362798102195]
We propose D3FG, a functional-group-based diffusion model for pocket-specific molecule generation and elaboration.
D3FG decomposes molecules into two categories of components: functional groups defined as rigid bodies and linkers as mass points.
In the experiments, our method can generate molecules with more realistic 3D structures, competitive affinities toward the protein targets, and better drug properties.
arXiv Detail & Related papers (2023-05-30T06:41:20Z) - SILVR: Guided Diffusion for Molecule Generation [0.0]
We introduce a machine-learning method for conditioning an existing generative model without retraining.
The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits.
We show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments.
arXiv Detail & Related papers (2023-04-21T11:47:38Z) - De Novo Molecular Generation via Connection-aware Motif Mining [197.97528902698966]
We propose a new method, MiCaM, to generate molecules based on mined connection-aware motifs.
The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information.
Based on the mined connection-aware motifs, MiCaM builds a connection-aware generator, which simultaneously picks up motifs and determines how they are connected.
arXiv Detail & Related papers (2023-02-02T14:40:47Z) - Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design [82.23006955069229]
We propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design.
Our model places missing atoms in between and designs a molecule incorporating all the initial fragments.
We demonstrate that DiffLinker outperforms other methods on the standard datasets generating more diverse and synthetically-accessible molecules.
arXiv Detail & Related papers (2022-10-11T09:13:37Z) - Fragment-based Sequential Translation for Molecular Optimization [23.152338167332374]
We propose a flexible editing paradigm that generates molecules using learned molecular fragments.
We use a variational autoencoder to encode molecular fragments in a coherent latent space.
We then utilize as a vocabulary for editing molecules to explore the complex chemical property space.
arXiv Detail & Related papers (2021-10-26T21:20:54Z) - A Deep Generative Model for Fragment-Based Molecule Generation [21.258861822241272]
We develop a language model for small molecular substructures called fragments.
In other words, we generate molecules fragment by fragment, instead of atom by atom.
We show experimentally that our model largely outperforms other language model-based competitors.
arXiv Detail & Related papers (2020-02-28T15:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.