ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining
- URL: http://arxiv.org/abs/2405.14225v1
- Date: Thu, 23 May 2024 06:55:59 GMT
- Title: ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining
- Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua,
- Abstract summary: We propose ReactXT for reaction-text modeling and OpenExp for experimental procedure prediction.
ReactXT features three types of input contexts to incrementally pretrain LMs.
Our code is available at https://github.com/syr-cn/ReactXT.
- Score: 76.51346919370005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecule-text modeling, which aims to facilitate molecule-relevant tasks with a textual interface and textual knowledge, is an emerging research direction. Beyond single molecules, studying reaction-text modeling holds promise for helping the synthesis of new materials and drugs. However, previous works mostly neglect reaction-text modeling: they primarily focus on modeling individual molecule-text pairs or learning chemical reactions without texts in context. Additionally, one key task of reaction-text modeling -- experimental procedure prediction -- is less explored due to the absence of an open-source dataset. The task is to predict step-by-step actions of conducting chemical experiments and is crucial to automating chemical synthesis. To resolve the challenges above, we propose a new pretraining method, ReactXT, for reaction-text modeling, and a new dataset, OpenExp, for experimental procedure prediction. Specifically, ReactXT features three types of input contexts to incrementally pretrain LMs. Each of the three input contexts corresponds to a pretraining task to improve the text-based understanding of either reactions or single molecules. ReactXT demonstrates consistent improvements in experimental procedure prediction and molecule captioning and offers competitive results in retrosynthesis. Our code is available at https://github.com/syr-cn/ReactXT.
Related papers
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [33.293741487835824]
Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines.
Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions.
This study introduces PRESTO, a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations.
arXiv Detail & Related papers (2024-06-19T03:59:46Z) - T-Rex: Text-assisted Retrosynthesis Prediction [17.955825423710817]
T-Rex is a text-assisted retrosynthesis prediction approach.
It exploits pre-trained text language models, such as ChatGPT, to assist the generation of reactants.
arXiv Detail & Related papers (2024-01-26T04:08:50Z) - Predictive Chemistry Augmented with Text Retrieval [37.59545092901872]
We introduce TextReact, a novel method that directly augments predictive chemistry with texts retrieved from the literature.
TextReact retrieves text descriptions relevant for a given chemical reaction, and then aligns them with the molecular representation of the reaction.
We empirically validate the framework on two chemistry tasks: reaction condition recommendation and one-step retrosynthesis.
arXiv Detail & Related papers (2023-12-08T07:40:59Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - Retroformer: Pushing the Limits of Interpretable End-to-end
Retrosynthesis Transformer [15.722719721123054]
Retrosynthesis prediction is one of the fundamental challenges in organic synthesis.
We propose Retroformer, a novel Transformer-based architecture for retrosynthesis prediction.
Retroformer reaches the new state-of-the-art accuracy for the end-to-end template-free retrosynthesis.
arXiv Detail & Related papers (2022-01-29T02:03:55Z) - Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction
Representation [70.97737157902947]
There is currently no universal and widely adopted method for robustly representing chemical reactions.
Here we exploit graph-based representations of molecular structures to develop and test a hypergraph attention neural network approach.
We evaluate this hypergraph representation in three experiments using three independent data sets of chemical reactions.
arXiv Detail & Related papers (2022-01-02T12:33:10Z) - RetroComposer: Discovering Novel Reactions by Composing Templates for
Retrosynthesis Prediction [63.14937611038264]
We propose an innovative retrosynthesis prediction framework that can compose novel templates beyond training templates.
Experimental results show that our method can produce novel templates for 328 test reactions in the USPTO-50K dataset.
arXiv Detail & Related papers (2021-12-20T05:57:07Z) - RetroXpert: Decompose Retrosynthesis Prediction like a Chemist [60.463900712314754]
We devise a novel template-free algorithm for automatic retrosynthetic expansion.
Our method disassembles retrosynthesis into two steps.
While outperforming the state-of-the-art baselines, our model also provides chemically reasonable interpretation.
arXiv Detail & Related papers (2020-11-04T04:35:34Z) - Retrosynthesis Prediction with Conditional Graph Logic Network [118.70437805407728]
Computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities.
We propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks.
arXiv Detail & Related papers (2020-01-06T05:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.