3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation
- URL: http://arxiv.org/abs/2403.07179v2
- Date: Wed, 02 Oct 2024 20:09:18 GMT
- Title: 3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation
- Authors: Huaisheng Zhu, Teng Xiao, Vasant G Honavar,
- Abstract summary: 3M-Diffusion is a novel multi-modal molecular graph generation method.
It generates diverse, ideally novel molecular structures with desired properties.
- Score: 18.55127917150268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating molecular structures with desired properties is a critical task with broad applications in drug discovery and materials design. We propose 3M-Diffusion, a novel multi-modal molecular graph generation method, to generate diverse, ideally novel molecular structures with desired properties. 3M-Diffusion encodes molecular graphs into a graph latent space which it then aligns with the text space learned by encoder-based LLMs from textual descriptions. It then reconstructs the molecular structure and atomic attributes based on the given text descriptions using the molecule decoder. It then learns a probabilistic mapping from the text space to the latent molecular graph space using a diffusion model. The results of our extensive experiments on several datasets demonstrate that 3M-Diffusion can generate high-quality, novel and diverse molecular graphs that semantically match the textual description provided.
Related papers
- GraphT5: Unified Molecular Graph-Language Modeling via Multi-Modal Cross-Token Attention [5.949779668853557]
We propose a framework that integrates 1D SMILES text and 2D graph representations of molecules for molecular language modeling.
Cross-token attention exploits implicit information between SMILES and graphs of molecules, resulting from their interactions at a fine-grained token level.
arXiv Detail & Related papers (2025-03-07T07:57:16Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network.
We develop a robust decoder that bridges latent embeddings and molecular structures.
Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Text-guided Diffusion Model for 3D Molecule Generation [26.09786612721824]
We introduce TextSMOG, a new Text-guided Small Molecule Generation Approach via 3D Diffusion Model.
This method uses textual conditions to guide molecule generation, enhancing both stability and diversity.
Experimental results show TextSMOG's proficiency in capturing and utilizing information from textual descriptions.
arXiv Detail & Related papers (2024-10-04T10:23:20Z) - 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization [41.07090635630771]
3D-MolT5 is a unified framework designed to model both 1D molecular sequence and 3D molecular structure.
Key innovation lies in our methodology for mapping fine-grained 3D substructure representations to a specialized 3D token vocabulary.
Our proposed 3D-MolT5 shows superior performance than existing methods in molecular property prediction, molecule captioning, and text-based molecule generation tasks.
arXiv Detail & Related papers (2024-06-09T14:20:55Z) - LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.
LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space.
We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - SubGDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning [14.338345772161102]
We propose a novel diffusion model termed SubGDiff for involving the molecular subgraph information in diffusion.
SubGDiff adopts three vital techniques: subgraph prediction, expectation state, and k-step same subgraph diffusion.
Experimentally, extensive downstream tasks demonstrate the superior performance of our approach.
arXiv Detail & Related papers (2024-05-09T10:37:33Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Towards 3D Molecule-Text Interpretation in Language Models [125.56693661827181]
3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder.
This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space.
We meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT.
arXiv Detail & Related papers (2024-01-25T03:42:00Z) - MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates.
We propose a novel graph transformer architecture to denoise the diffusion process.
Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Learning a Continuous Representation of 3D Molecular Structures with
Deep Generative Models [0.0]
Generative models are an entirely different approach that learn to represent and optimize molecules in a continuous latent space.
We describe deep generative models of three dimensional molecular structures using atomic density grids.
We are also able to sample diverse sets of molecules based on a given input compound to increase the probability of creating valid, drug-like molecules.
arXiv Detail & Related papers (2020-10-17T01:15:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.