Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
- URL: http://arxiv.org/abs/2407.15141v1
- Date: Sun, 21 Jul 2024 12:27:26 GMT
- Title: Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
- Authors: Yu Zhang, Ruijie Yu, Kaipeng Zeng, Ding Li, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu,
- Abstract summary: MM-RCR is a text-augmented multimodal LLM that learns a unified reaction representation from SMILES, reaction graphs, and textual corpus for chemical reaction recommendation (RCR)
Our results demonstrate that MM-RCR achieves state-of-the-art performance on two open benchmark datasets.
- Score: 50.639325453203504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-throughput reaction condition (RC) screening is fundamental to chemical synthesis. However, current RC screening suffers from laborious and costly trial-and-error workflows. Traditional computer-aided synthesis planning (CASP) tools fail to find suitable RCs due to data sparsity and inadequate reaction representations. Nowadays, large language models (LLMs) are capable of tackling chemistry-related problems, such as molecule design, and chemical logic Q\&A tasks. However, LLMs have not yet achieved accurate predictions of chemical reaction conditions. Here, we present MM-RCR, a text-augmented multimodal LLM that learns a unified reaction representation from SMILES, reaction graphs, and textual corpus for chemical reaction recommendation (RCR). To train MM-RCR, we construct 1.2 million pair-wised Q\&A instruction datasets. Our experimental results demonstrate that MM-RCR achieves state-of-the-art performance on two open benchmark datasets and exhibits strong generalization capabilities on out-of-domain (OOD) and High-Throughput Experimentation (HTE) datasets. MM-RCR has the potential to accelerate high-throughput condition screening in chemical synthesis.
Related papers
- Contextual Molecule Representation Learning from Chemical Reaction
Knowledge [24.501564702095937]
We introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry.
REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature.
arXiv Detail & Related papers (2024-02-21T12:58:40Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - ReacLLaMA: Merging chemical and textual information in chemical
reactivity AI models [0.0]
Chemical reactivity models are developed to predict chemical reaction outcomes in the form of classification (success/failure) or regression (product yield) tasks.
The vast majority of the reported models are trained solely on chemical information such as reactants, products, reagents, and solvents.
Herein incorporation of procedural text with the aim to augment the Graphormer reactivity model and improve its accuracy is presented.
arXiv Detail & Related papers (2024-01-30T18:57:08Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - ReLM: Leveraging Language Models for Enhanced Chemical Reaction
Prediction [26.342666819515774]
ReLM is a framework that leverages the chemical knowledge encoded in language models (LMs) to assist Graph Neural Networks (GNNs)
Our experimental results demonstrate that ReLM improves the performance of state-of-the-art GNN-based methods across various chemical reaction datasets.
arXiv Detail & Related papers (2023-10-20T15:33:23Z) - Root-aligned SMILES for Molecular Retrosynthesis Prediction [31.818364437526885]
Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to discover precursor molecules that can be used to synthesize a target molecule.
A popular paradigm of existing computational retrosynthesis methods formulate retrosynthesis prediction as a sequence-to-sequence translation problem.
We propose the root-aligned SMILES(R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES.
arXiv Detail & Related papers (2022-03-22T03:50:04Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - RetCL: A Selection-based Approach for Retrosynthesis via Contrastive
Learning [107.64562550844146]
Retrosynthesis is an emerging research area of deep learning.
We propose a new approach that reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules.
For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining.
arXiv Detail & Related papers (2021-05-03T12:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.