SynKB: Semantic Search for Synthetic Procedures
- URL: http://arxiv.org/abs/2208.07400v1
- Date: Mon, 15 Aug 2022 18:33:16 GMT
- Title: SynKB: Semantic Search for Synthetic Procedures
- Authors: Fan Bai, Alan Ritter, Peter Madrid, Dayne Freitag, John Niekrasz
- Abstract summary: We present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols.
Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures.
- Score: 9.360528362635215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present SynKB, an open-source, automatically extracted
knowledge base of chemical synthesis protocols. Similar to proprietary
chemistry databases such as Reaxsys, SynKB allows chemists to retrieve
structured knowledge about synthetic procedures. By taking advantage of recent
advances in natural language processing for procedural texts, SynKB supports
more flexible queries about reaction conditions, and thus has the potential to
help chemists search the literature for conditions used in relevant reactions
as they design new synthetic routes. Using customized Transformer models to
automatically extract information from 6 million synthesis procedures described
in U.S. and EU patents, we show that for many queries, SynKB has higher recall
than Reaxsys, while maintaining high precision. We plan to make SynKB available
as an open-source tool; in contrast, proprietary chemistry databases require
costly subscriptions.
Related papers
- SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor.
We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - RLSynC: Offline-Online Reinforcement Learning for Synthon Completion [1.4999444543328293]
We develop a new offline-online reinforcement learning method RLSynC for synthon completion in semi-template-based methods.
Our results demonstrate that RLSynC can outperform state-of-the-art synthon completion methods with improvements as high as 14.9%.
arXiv Detail & Related papers (2023-09-06T02:40:33Z) - Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from
Literature with GPT-3 [52.59930033705221]
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
arXiv Detail & Related papers (2023-04-26T22:21:33Z) - Precursor recommendation for inorganic synthesis by machine learning
materials similarity from scientific literature [0.0]
We use a knowledge base of 29,900 solid-state synthesis recipes to automatically learn which precursors to recommend for the synthesis of a novel target material.
The data-driven approach learns chemical similarity of materials and refers the synthesis of a new target to precedent synthesis procedures of similar materials.
Our approach captures decades of synthesis data in a mathematical form, making it accessible for use in recommendation engines and autonomous laboratories.
arXiv Detail & Related papers (2023-02-05T04:57:59Z) - Recent advances in artificial intelligence for retrosynthesis [29.32667622776065]
Retrosynthesis is the cornerstone of organic chemistry, providing chemists in material and drug manufacturing access to poorly available and brand-new molecules.
Recent breakthroughs driven by artificial intelligence have revolutionized retrosynthesis.
arXiv Detail & Related papers (2023-01-14T09:29:39Z) - Importance of Synthesizing High-quality Data for Text-to-SQL Parsing [71.02856634369174]
State-of-the-art text-to-weighted algorithms did not further improve on popular benchmarks when trained with augmented synthetic data.
We propose a novel framework that incorporates key relationships from schema, imposes strong typing, and schema-weighted column sampling.
arXiv Detail & Related papers (2022-12-17T02:53:21Z) - PcMSP: A Dataset for Scientific Action Graphs Extraction from
Polycrystalline Materials Synthesis Procedure Text [1.9573380763700712]
This dataset simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations.
A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus.
We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of entities and relations.
arXiv Detail & Related papers (2022-10-22T09:43:54Z) - FusionRetro: Molecule Representation Fusion via In-Context Learning for
Retrosynthetic Planning [58.47265392465442]
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule.
Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms.
We propose a novel framework that utilizes context information for improved retrosynthetic planning.
arXiv Detail & Related papers (2022-09-30T08:44:58Z) - ULSA: Unified Language of Synthesis Actions for Representation of
Synthesis Protocols [2.436060325115753]
We propose the first Unified Language of Synthesis Actions (ULSA) for describing synthesis procedures.
We created a dataset of 3,040 synthesis procedures annotated by domain experts according to the proposed ULSA scheme.
arXiv Detail & Related papers (2022-01-23T17:44:48Z) - RetroXpert: Decompose Retrosynthesis Prediction like a Chemist [60.463900712314754]
We devise a novel template-free algorithm for automatic retrosynthetic expansion.
Our method disassembles retrosynthesis into two steps.
While outperforming the state-of-the-art baselines, our model also provides chemically reasonable interpretation.
arXiv Detail & Related papers (2020-11-04T04:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.