Drug Discovery with Dynamic Goal-aware Fragments
- URL: http://arxiv.org/abs/2310.00841v3
- Date: Thu, 30 May 2024 13:03:32 GMT
- Title: Drug Discovery with Dynamic Goal-aware Fragments
- Authors: Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang,
- Abstract summary: We propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM)
GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification.
We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules.
- Score: 76.10700304803177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments contributing to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks. Our code is available at https://github.com/SeulLee05/GEAM.
Related papers
- DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.
This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z) - GenMol: A Drug Discovery Generalist with Discrete Diffusion [43.29814519270451]
Generalist Molecular generative model (GenMol) is a versatile framework that addresses various aspects of the drug discovery pipeline.
Under the discrete diffusion framework, we introduce fragment remasking, a strategy that optimize molecules by replacing fragments with masked tokens.
GenMol significantly outperforms the previous GPT-based model trained on SAFE representations in de novo generation and fragment-constrained generation.
arXiv Detail & Related papers (2025-01-10T18:30:05Z) - RFL: Simplifying Chemical Structure Recognition with Ring-Free Language [66.47173094346115]
We propose a novel Ring-Free Language (RFL) to describe chemical structures in a hierarchical form.
RFL allows complex molecular structures to be decomposed into multiple parts, ensuring both uniqueness and conciseness.
We propose a universal Molecular Skeleton Decoder (MSD), which comprises a skeleton generation module that progressively predicts the molecular skeleton and individual rings.
arXiv Detail & Related papers (2024-12-10T15:29:32Z) - Molecule Generation with Fragment Retrieval Augmentation [41.95947899013865]
Fragment Retrieval-Augmented Generation (f-RAG) is based on a pre-trained molecular generative model that proposes additional fragments to complete and generate a new molecule.
To extrapolate beyond the existing fragments, f-RAG updates the fragment vocabulary with generated fragments via an iterative refinement process.
arXiv Detail & Related papers (2024-11-18T21:43:52Z) - De Novo Molecular Generation via Connection-aware Motif Mining [197.97528902698966]
We propose a new method, MiCaM, to generate molecules based on mined connection-aware motifs.
The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information.
Based on the mined connection-aware motifs, MiCaM builds a connection-aware generator, which simultaneously picks up motifs and determines how they are connected.
arXiv Detail & Related papers (2023-02-02T14:40:47Z) - t-SMILES: A Scalable Fragment-based Molecular Representation Framework for De Novo Molecule Generation [9.116670221263753]
This study introduces a flexible, fragment-based, multiscale molecular representation framework called t-SMILES.
It describes molecules using SMILES-type strings obtained by performing a breadth-first search on a full binary tree formed from a fragmented molecular graph.
It significantly outperforms classical SMILES, DeepSMILES, SELFIES and baseline models in goal-directed tasks.
arXiv Detail & Related papers (2023-01-04T21:41:01Z) - A Tri-Layer Plugin to Improve Occluded Detection [100.99802831241583]
We propose a simple '' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.
The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object.
We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects.
arXiv Detail & Related papers (2022-10-18T17:59:51Z) - Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design [82.23006955069229]
We propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design.
Our model places missing atoms in between and designs a molecule incorporating all the initial fragments.
We demonstrate that DiffLinker outperforms other methods on the standard datasets generating more diverse and synthetically-accessible molecules.
arXiv Detail & Related papers (2022-10-11T09:13:37Z) - Fragment-based Sequential Translation for Molecular Optimization [23.152338167332374]
We propose a flexible editing paradigm that generates molecules using learned molecular fragments.
We use a variational autoencoder to encode molecular fragments in a coherent latent space.
We then utilize as a vocabulary for editing molecules to explore the complex chemical property space.
arXiv Detail & Related papers (2021-10-26T21:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.