ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision
- URL: http://arxiv.org/abs/2307.01448v1
- Date: Tue, 4 Jul 2023 02:52:30 GMT
- Title: ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision
- Authors: Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan
Wang, Jiawei Han
- Abstract summary: structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.
Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts.
We propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions.
- Score: 27.850325653751078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured chemical reaction information plays a vital role for chemists
engaged in laboratory work and advanced endeavors such as computer-aided drug
design. Despite the importance of extracting structured reactions from
scientific literature, data annotation for this purpose is cost-prohibitive due
to the significant labor required from domain experts. Consequently, the
scarcity of sufficient training data poses an obstacle to the progress of
related models in this domain. In this paper, we propose ReactIE, which
combines two weakly supervised approaches for pre-training. Our method utilizes
frequent patterns within the text as linguistic cues to identify specific
characteristics of chemical reactions. Additionally, we adopt synthetic data
from patent records as distant supervision to incorporate domain knowledge into
the model. Experiments demonstrate that ReactIE achieves substantial
improvements and outperforms all existing baselines.
Related papers
- A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions [24.80165173525286]
We introduce a data-curated self-feedback knowledge elicitation approach.
We employ adaptive prompt learning to infuse the prior knowledge into the large language model.
This research offers a novel paradigm for knowledge elicitation in scientific research.
arXiv Detail & Related papers (2024-04-15T09:26:33Z) - Contextual Molecule Representation Learning from Chemical Reaction
Knowledge [24.501564702095937]
We introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry.
REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature.
arXiv Detail & Related papers (2024-02-21T12:58:40Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways
via Contrastive Learning [45.379791270351184]
RMechRP is a new deep learning-based reaction predictor system.
We develop and train models using RMechDB, a public database of radical reactions.
Our results demonstrate the effectiveness of RMechRP in providing accurate and interpretable predictions.
arXiv Detail & Related papers (2023-11-02T09:47:27Z) - Stress Testing BERT Anaphora Resolution Models for Reaction Extraction
in Chemical Patents [7.653466578233261]
In chemical patents, there are five anaphoric relations of interest: co-reference, transformed, reaction associated, work up, and contained.
Our goal is to investigate how the performance of anaphora resolution models for reaction texts differs in a noise-free and noisy environment.
arXiv Detail & Related papers (2023-06-23T09:01:56Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction
Representation [70.97737157902947]
There is currently no universal and widely adopted method for robustly representing chemical reactions.
Here we exploit graph-based representations of molecular structures to develop and test a hypergraph attention neural network approach.
We evaluate this hypergraph representation in three experiments using three independent data sets of chemical reactions.
arXiv Detail & Related papers (2022-01-02T12:33:10Z) - Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction
Prediction and Synthesis Design [0.8594140167290099]
We identify three trends within the fields of chemical reaction prediction and synthesis design that require a change in direction.
First, the manner in which reaction datasets are split into reactants and reagents encourages testing models in an unrealistically generous manner.
Second, we highlight the prevalence of mislabelled data, and suggest that the focus should be on outlier removal rather than data fitting only.
arXiv Detail & Related papers (2021-05-06T13:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.