Instruction-Based Molecular Graph Generation with Unified Text-Graph Diffusion Model
- URL: http://arxiv.org/abs/2408.09896v1
- Date: Mon, 19 Aug 2024 11:09:15 GMT
- Title: Instruction-Based Molecular Graph Generation with Unified Text-Graph Diffusion Model
- Authors: Yuran Xiang, Haiteng Zhao, Chang Ma, Zhi-Hong Deng,
- Abstract summary: Unified Text-Graph Diffusion Model (UTGDiff) is a framework to generate molecular graphs from instructions.
UTGDiff features a unified text-graph transformer as the denoising network, derived from pre-trained language models.
Our experimental results demonstrate that UTGDiff consistently outperforms sequence-based baselines in tasks involving instruction-based molecule generation and editing.
- Score: 22.368332915420606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in computational chemistry have increasingly focused on synthesizing molecules based on textual instructions. Integrating graph generation with these instructions is complex, leading most current methods to use molecular sequences with pre-trained large language models. In response to this challenge, we propose a novel framework, named $\textbf{UTGDiff (Unified Text-Graph Diffusion Model)}$, which utilizes language models for discrete graph diffusion to generate molecular graphs from instructions. UTGDiff features a unified text-graph transformer as the denoising network, derived from pre-trained language models and minimally modified to process graph data through attention bias. Our experimental results demonstrate that UTGDiff consistently outperforms sequence-based baselines in tasks involving instruction-based molecule generation and editing, achieving superior performance with fewer parameters given an equivalent level of pretraining corpus. Our code is availble at https://github.com/ran1812/UTGDiff.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Explanation Graph Generation via Generative Pre-training over Synthetic
Graphs [6.25568933262682]
The generation of explanation graphs is a significant task that aims to produce explanation graphs in response to user input.
Current research commonly fine-tunes a text-based pre-trained language model on a small downstream dataset that is annotated with labeled graphs.
We propose a novel pre-trained framework EG3P(for Explanation Graph Generation via Generative Pre-training over synthetic graphs) for the explanation graph generation task.
arXiv Detail & Related papers (2023-06-01T13:20:22Z) - GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule
Zero-Shot Learning [71.89623260998934]
This study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting.
Existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs.
We propose GIMLET, which unifies language models for both graph and text data.
arXiv Detail & Related papers (2023-05-28T18:27:59Z) - Graph Generation with Diffusion Mixture [57.78958552860948]
Generation of graphs is a major challenge for real-world tasks that require understanding the complex nature of their non-Euclidean structures.
We propose a generative framework that models the topology of graphs by explicitly learning the final graph structures of the diffusion process.
arXiv Detail & Related papers (2023-02-07T17:07:46Z) - Conditional Diffusion Based on Discrete Graph Structures for Molecular
Graph Generation [32.66694406638287]
We propose a Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation.
Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through differential equations (SDE)
We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states.
arXiv Detail & Related papers (2023-01-01T15:24:15Z) - Explanation Graph Generation via Pre-trained Language Models: An
Empirical Study with Contrastive Learning [84.35102534158621]
We study pre-trained language models that generate explanation graphs in an end-to-end manner.
We propose simple yet effective ways of graph perturbations via node and edge edit operations.
Our methods lead to significant improvements in both structural and semantic accuracy of explanation graphs.
arXiv Detail & Related papers (2022-04-11T00:58:27Z) - Data-Efficient Graph Grammar Learning for Molecular Generation [41.936515793383]
We propose a data-efficient generative model that can be learned from datasets with orders of smaller magnitude sizes than common benchmarks.
Our learned graph grammar yields state-of-the-art results on generating high-quality molecules for three monomer datasets.
Our approach also achieves remarkable performance in a challenging polymer generation task with only $117$ training samples.
arXiv Detail & Related papers (2022-03-15T16:14:30Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Molecular graph generation with Graph Neural Networks [2.7393821783237184]
We introduce a sequential molecular graph generator based on a set of graph neural network modules, which we call MG2N2.
Our model is capable of generalizing molecular patterns seen during the training phase, without overfitting.
arXiv Detail & Related papers (2020-12-14T10:32:57Z) - Neural Language Modeling for Contextualized Temporal Graph Generation [49.21890450444187]
This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document.
arXiv Detail & Related papers (2020-10-20T07:08:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.