An Autonomous Large Language Model Agent for Chemical Literature Data
Mining
- URL: http://arxiv.org/abs/2402.12993v1
- Date: Tue, 20 Feb 2024 13:21:46 GMT
- Title: An Autonomous Large Language Model Agent for Chemical Literature Data
Mining
- Authors: Kexin Chen, Hanqun Cao, Junyou Li, Yuyang Du, Menghao Guo, Xin Zeng,
Lanqing Li, Jiezhong Qiu, Pheng Ann Heng, Guangyong Chen
- Abstract summary: We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
- Score: 60.85177362167166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chemical synthesis, which is crucial for advancing material synthesis and
drug discovery, impacts various sectors including environmental science and
healthcare. The rise of technology in chemistry has generated extensive
chemical data, challenging researchers to discern patterns and refine synthesis
processes. Artificial intelligence (AI) helps by analyzing data to optimize
synthesis and increase yields. However, AI faces challenges in processing
literature data due to the unstructured format and diverse writing style of
chemical literature. To overcome these difficulties, we introduce an end-to-end
AI agent framework capable of high-fidelity extraction from extensive chemical
literature. This AI agent employs large language models (LLMs) for prompt
generation and iterative optimization. It functions as a chemistry assistant,
automating data collection and analysis, thereby saving manpower and enhancing
performance. Our framework's efficacy is evaluated using accuracy, recall, and
F1 score of reaction condition data, and we compared our method with human
experts in terms of content correctness and time efficiency. The proposed
approach marks a significant advancement in automating chemical literature
extraction and demonstrates the potential for AI to revolutionize data
management and utilization in chemistry.
Related papers
- Validation of the Scientific Literature via Chemputation Augmented by Large Language Models [0.0]
Chemputation is the process of programming chemical robots to do experiments using a universal symbolic language, but the literature can be error prone and hard to read due to ambiguities.
Large Language Models (LLMs) have demonstrated remarkable capabilities in various domains, including natural language processing, robotic control, and more recently, chemistry.
We introduce an LLM-based chemical research agent workflow designed for the automatic validation of synthetic literature procedures.
arXiv Detail & Related papers (2024-10-08T21:31:42Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Agent-based Learning of Materials Datasets from Scientific Literature [0.0]
We develop a chemist AI agent, powered by large language models (LLMs), to create structured datasets from natural language text.
Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles.
arXiv Detail & Related papers (2023-12-18T20:29:58Z) - AIMS-EREA -- A framework for AI-accelerated Innovation of Materials for
Sustainability -- for Environmental Remediation and Energy Applications [0.0]
AIMS-EREA is our novel framework to blend best of breed of Material Science theory with power of Generative AI.
This also helps to eliminate the possibility of production of hazardous residues and bye-products of the reactions.
arXiv Detail & Related papers (2023-11-18T12:35:45Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search [83.22850633478302]
Retrosynthetic planning identifies a series of reactions that can lead to the synthesis of a target product.
Existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality.
We propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently.
arXiv Detail & Related papers (2020-06-29T05:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.