Chemical Reaction Extraction from Long Patent Documents
- URL: http://arxiv.org/abs/2407.15124v2
- Date: Tue, 23 Jul 2024 07:11:47 GMT
- Title: Chemical Reaction Extraction from Long Patent Documents
- Authors: Aishwarya Jadhav, Ritam Dutt,
- Abstract summary: ChemPatKB can be used to aid in prior art searches and to provide a platform for domain experts to explore new innovations in chemical compound synthesis and use-cases.
An essential foundational component of this KB is the extraction of important reaction snippets from long patents documents.
In this work, we explore the problem of extracting reactions spans from chemical patents in order to create a reactions resource database.
- Score: 3.376269351435396
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The task of searching through patent documents is crucial for chemical patent recommendation and retrieval. This can be enhanced by creating a patent knowledge base (ChemPatKB) to aid in prior art searches and to provide a platform for domain experts to explore new innovations in chemical compound synthesis and use-cases. An essential foundational component of this KB is the extraction of important reaction snippets from long patents documents which facilitates multiple downstream tasks such as reaction co-reference resolution and chemical entity role identification. In this work, we explore the problem of extracting reactions spans from chemical patents in order to create a reactions resource database. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs that contain a description of a reaction. We propose several approaches and modifications of the baseline models and study how different methods generalize across different domains of chemical patents.
Related papers
- Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [60.93245342663455]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models.
This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries.
In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z) - Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs [11.191853171170516]
We propose an agent system that integrates large language models (LLMs) and knowledge graphs.
Our system fully automates the retrieval of relevant literatures, extraction of reaction data, database querying, construction of retrosynthetic pathway trees.
This work represents the first attempt to develop a fully automated retrosynthesis planning agent tailored specially for macromolecules powered by LLMs.
arXiv Detail & Related papers (2025-01-15T16:06:10Z) - Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
This paper introduces modelname, a novel chemical reaction representation learning model tailored for a variety of organic-reaction-related tasks.
By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction, thereby enhancing the comprehension of the reaction mechanism.
We have designed an adapter structure to incorporate reaction conditions into the chemical reaction representation, allowing the model to handle diverse reaction conditions and adapt to various datasets and downstream tasks, e.g., reaction performance prediction.
arXiv Detail & Related papers (2024-11-26T17:41:44Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - PATopics: An automatic framework to extract useful information from pharmaceutical patents documents [4.340983107526238]
PATopics is a framework specially designed to extract relevant information for Pharmaceutical patents.
We extensively analyzed the framework using 4,832 pharmaceutical patents concerning 809 molecules patented by 478 companies.
arXiv Detail & Related papers (2024-08-12T19:18:51Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision [27.850325653751078]
structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.
Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts.
We propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions.
arXiv Detail & Related papers (2023-07-04T02:52:30Z) - A Unified View of Deep Learning for Reaction and Retrosynthesis
Prediction: Current Status and Future Challenges [59.41636061300571]
Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry.
Various deep learning approaches have been proposed to tackle these problems.
This paper is the first comprehensive and systematic survey that seeks to provide a unified understanding of reaction and retrosynthesis prediction.
arXiv Detail & Related papers (2023-06-28T03:15:55Z) - Stress Testing BERT Anaphora Resolution Models for Reaction Extraction
in Chemical Patents [7.653466578233261]
In chemical patents, there are five anaphoric relations of interest: co-reference, transformed, reaction associated, work up, and contained.
Our goal is to investigate how the performance of anaphora resolution models for reaction texts differs in a noise-free and noisy environment.
arXiv Detail & Related papers (2023-06-23T09:01:56Z) - Differentiable Programming of Chemical Reaction Networks [63.948465205530916]
Chemical reaction networks are one of the most fundamental computational substrates used by nature.
We study well-mixed single-chamber systems, as well as systems with multiple chambers separated by membranes.
We demonstrate that differentiable optimisation, combined with proper regularisation, can discover non-trivial sparse reaction networks.
arXiv Detail & Related papers (2023-02-06T11:41:14Z) - ChemiRise: a data-driven retrosynthesis engine [19.52621175562223]
ChemiRise can propose complete retrosynthesis routes for organic compounds rapidly and reliably.
System was trained on a processed patent database of over 3 million organic reactions.
arXiv Detail & Related papers (2021-08-09T05:13:14Z) - Named entity recognition in chemical patents using ensemble of
contextual language models [0.3731111830152912]
We study the effectiveness of contextualized language models to extract information from chemical patents.
Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%.
arXiv Detail & Related papers (2020-07-24T15:23:45Z) - Retrosynthesis Prediction with Conditional Graph Logic Network [118.70437805407728]
Computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities.
We propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks.
arXiv Detail & Related papers (2020-01-06T05:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.