Stress Testing BERT Anaphora Resolution Models for Reaction Extraction
in Chemical Patents
- URL: http://arxiv.org/abs/2306.13379v1
- Date: Fri, 23 Jun 2023 09:01:56 GMT
- Title: Stress Testing BERT Anaphora Resolution Models for Reaction Extraction
in Chemical Patents
- Authors: Chieling Yueh, Evangelos Kanoulas, Bruno Martins, Camilo Thorne, Saber
Akhondi
- Abstract summary: In chemical patents, there are five anaphoric relations of interest: co-reference, transformed, reaction associated, work up, and contained.
Our goal is to investigate how the performance of anaphora resolution models for reaction texts differs in a noise-free and noisy environment.
- Score: 7.653466578233261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The high volume of published chemical patents and the importance of a timely
acquisition of their information gives rise to automating information
extraction from chemical patents. Anaphora resolution is an important component
of comprehensive information extraction, and is critical for extracting
reactions. In chemical patents, there are five anaphoric relations of interest:
co-reference, transformed, reaction associated, work up, and contained. Our
goal is to investigate how the performance of anaphora resolution models for
reaction texts in chemical patents differs in a noise-free and noisy
environment and to what extent we can improve the robustness against noise of
the model.
Related papers
- Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [60.93245342663455]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models.
This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries.
In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z) - Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
This paper introduces modelname, a novel chemical reaction representation learning model tailored for a variety of organic-reaction-related tasks.
By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction, thereby enhancing the comprehension of the reaction mechanism.
We have designed an adapter structure to incorporate reaction conditions into the chemical reaction representation, allowing the model to handle diverse reaction conditions and adapt to various datasets and downstream tasks, e.g., reaction performance prediction.
arXiv Detail & Related papers (2024-11-26T17:41:44Z) - log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling [6.310759215182946]
log-RRIM is an innovative graph transformer-based framework designed for predicting chemical reaction yields.
Our approach implements a unique local-to-global reaction representation learning strategy.
Its advanced modeling of reactant-reagent interactions and sensitivity to small molecular fragments make it a valuable tool for reaction planning and optimization in chemical synthesis.
arXiv Detail & Related papers (2024-10-20T18:35:56Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - Chemical Reaction Extraction from Long Patent Documents [3.376269351435396]
ChemPatKB can be used to aid in prior art searches and to provide a platform for domain experts to explore new innovations in chemical compound synthesis and use-cases.
An essential foundational component of this KB is the extraction of important reaction snippets from long patents documents.
In this work, we explore the problem of extracting reactions spans from chemical patents in order to create a reactions resource database.
arXiv Detail & Related papers (2024-07-21T11:27:27Z) - A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions [24.80165173525286]
We introduce a data-curated self-feedback knowledge elicitation approach.
We employ adaptive prompt learning to infuse the prior knowledge into the large language model.
This research offers a novel paradigm for knowledge elicitation in scientific research.
arXiv Detail & Related papers (2024-04-15T09:26:33Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision [27.850325653751078]
structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.
Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts.
We propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions.
arXiv Detail & Related papers (2023-07-04T02:52:30Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z) - Named entity recognition in chemical patents using ensemble of
contextual language models [0.3731111830152912]
We study the effectiveness of contextualized language models to extract information from chemical patents.
Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%.
arXiv Detail & Related papers (2020-07-24T15:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.