Predictive Chemistry Augmented with Text Retrieval
- URL: http://arxiv.org/abs/2312.04881v1
- Date: Fri, 8 Dec 2023 07:40:59 GMT
- Title: Predictive Chemistry Augmented with Text Retrieval
- Authors: Yujie Qian, Zhening Li, Zhengkai Tu, Connor W. Coley, Regina Barzilay
- Abstract summary: We introduce TextReact, a novel method that directly augments predictive chemistry with texts retrieved from the literature.
TextReact retrieves text descriptions relevant for a given chemical reaction, and then aligns them with the molecular representation of the reaction.
We empirically validate the framework on two chemistry tasks: reaction condition recommendation and one-step retrosynthesis.
- Score: 37.59545092901872
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on using natural language descriptions to enhance
predictive models in the chemistry field. Conventionally, chemoinformatics
models are trained with extensive structured data manually extracted from the
literature. In this paper, we introduce TextReact, a novel method that directly
augments predictive chemistry with texts retrieved from the literature.
TextReact retrieves text descriptions relevant for a given chemical reaction,
and then aligns them with the molecular representation of the reaction. This
alignment is enhanced via an auxiliary masked LM objective incorporated in the
predictor training. We empirically validate the framework on two chemistry
tasks: reaction condition recommendation and one-step retrosynthesis. By
leveraging text retrieval, TextReact significantly outperforms state-of-the-art
chemoinformatics models trained solely on molecular data.
Related papers
- BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining [76.51346919370005]
We propose ReactXT for reaction-text modeling and OpenExp for experimental procedure prediction.
ReactXT features three types of input contexts to incrementally pretrain LMs.
Our code is available at https://github.com/syr-cn/ReactXT.
arXiv Detail & Related papers (2024-05-23T06:55:59Z) - Contextual Molecule Representation Learning from Chemical Reaction
Knowledge [24.501564702095937]
We introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry.
REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature.
arXiv Detail & Related papers (2024-02-21T12:58:40Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - ReacLLaMA: Merging chemical and textual information in chemical
reactivity AI models [0.0]
Chemical reactivity models are developed to predict chemical reaction outcomes in the form of classification (success/failure) or regression (product yield) tasks.
The vast majority of the reported models are trained solely on chemical information such as reactants, products, reagents, and solvents.
Herein incorporation of procedural text with the aim to augment the Graphormer reactivity model and improve its accuracy is presented.
arXiv Detail & Related papers (2024-01-30T18:57:08Z) - ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision [27.850325653751078]
structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.
Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts.
We propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions.
arXiv Detail & Related papers (2023-07-04T02:52:30Z) - MolXPT: Wrapping Molecules with Text for Generative Pre-training [141.0924452870112]
MolXPT is a unified language model of text and molecules pre-trained on SMILES wrapped by text.
MolXPT outperforms strong baselines of molecular property prediction on MoleculeNet.
arXiv Detail & Related papers (2023-05-18T03:58:19Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - Fine-Grained Chemical Entity Typing with Multimodal Knowledge
Representation [36.6963949360594]
How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge.
We propose a novel multi-modal representation learning framework to solve the problem of fine-grained chemical entity typing.
Experiment results show that the proposed framework outperforms multiple state-of-the-art methods.
arXiv Detail & Related papers (2021-08-29T19:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.