Chemical Reaction Extraction from Long Patent Documents
- URL: http://arxiv.org/abs/2407.15124v2
- Date: Tue, 23 Jul 2024 07:11:47 GMT
- Title: Chemical Reaction Extraction from Long Patent Documents
- Authors: Aishwarya Jadhav, Ritam Dutt,
- Abstract summary: ChemPatKB can be used to aid in prior art searches and to provide a platform for domain experts to explore new innovations in chemical compound synthesis and use-cases.
An essential foundational component of this KB is the extraction of important reaction snippets from long patents documents.
In this work, we explore the problem of extracting reactions spans from chemical patents in order to create a reactions resource database.
- Score: 3.376269351435396
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The task of searching through patent documents is crucial for chemical patent recommendation and retrieval. This can be enhanced by creating a patent knowledge base (ChemPatKB) to aid in prior art searches and to provide a platform for domain experts to explore new innovations in chemical compound synthesis and use-cases. An essential foundational component of this KB is the extraction of important reaction snippets from long patents documents which facilitates multiple downstream tasks such as reaction co-reference resolution and chemical entity role identification. In this work, we explore the problem of extracting reactions spans from chemical patents in order to create a reactions resource database. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs that contain a description of a reaction. We propose several approaches and modifications of the baseline models and study how different methods generalize across different domains of chemical patents.
Related papers
- Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision [27.850325653751078]
structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.
Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts.
We propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions.
arXiv Detail & Related papers (2023-07-04T02:52:30Z) - A Unified View of Deep Learning for Reaction and Retrosynthesis
Prediction: Current Status and Future Challenges [59.41636061300571]
Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry.
Various deep learning approaches have been proposed to tackle these problems.
This paper is the first comprehensive and systematic survey that seeks to provide a unified understanding of reaction and retrosynthesis prediction.
arXiv Detail & Related papers (2023-06-28T03:15:55Z) - Stress Testing BERT Anaphora Resolution Models for Reaction Extraction
in Chemical Patents [7.653466578233261]
In chemical patents, there are five anaphoric relations of interest: co-reference, transformed, reaction associated, work up, and contained.
Our goal is to investigate how the performance of anaphora resolution models for reaction texts differs in a noise-free and noisy environment.
arXiv Detail & Related papers (2023-06-23T09:01:56Z) - Differentiable Programming of Chemical Reaction Networks [63.948465205530916]
Chemical reaction networks are one of the most fundamental computational substrates used by nature.
We study well-mixed single-chamber systems, as well as systems with multiple chambers separated by membranes.
We demonstrate that differentiable optimisation, combined with proper regularisation, can discover non-trivial sparse reaction networks.
arXiv Detail & Related papers (2023-02-06T11:41:14Z) - Adaptive Information Seeking for Open-Domain Question Answering [61.39330982757494]
We propose a novel adaptive information-seeking strategy for open-domain question answering, namely AISO.
According to the learned policy, AISO could adaptively select a proper retrieval action to seek the missing evidence at each step.
AISO outperforms all baseline methods with predefined strategies in terms of both retrieval and answer evaluations.
arXiv Detail & Related papers (2021-09-14T15:08:13Z) - Fine-Grained Chemical Entity Typing with Multimodal Knowledge
Representation [36.6963949360594]
How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge.
We propose a novel multi-modal representation learning framework to solve the problem of fine-grained chemical entity typing.
Experiment results show that the proposed framework outperforms multiple state-of-the-art methods.
arXiv Detail & Related papers (2021-08-29T19:41:35Z) - ChemiRise: a data-driven retrosynthesis engine [19.52621175562223]
ChemiRise can propose complete retrosynthesis routes for organic compounds rapidly and reliably.
System was trained on a processed patent database of over 3 million organic reactions.
arXiv Detail & Related papers (2021-08-09T05:13:14Z) - Self-Improved Retrosynthetic Planning [66.5397931294144]
Retrosynthetic planning is a fundamental problem in chemistry for finding a pathway of reactions to synthesize a target molecule.
Recent search algorithms have shown promising results for solving this problem by using deep neural networks (DNNs)
We propose an end-to-end framework for directly training the DNNs towards generating reaction pathways with the desirable properties.
arXiv Detail & Related papers (2021-06-09T08:03:57Z) - Named entity recognition in chemical patents using ensemble of
contextual language models [0.3731111830152912]
We study the effectiveness of contextualized language models to extract information from chemical patents.
Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%.
arXiv Detail & Related papers (2020-07-24T15:23:45Z) - Retrosynthesis Prediction with Conditional Graph Logic Network [118.70437805407728]
Computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities.
We propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks.
arXiv Detail & Related papers (2020-01-06T05:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.