An Autonomous Large Language Model Agent for Chemical Literature Data
Mining
- URL: http://arxiv.org/abs/2402.12993v1
- Date: Tue, 20 Feb 2024 13:21:46 GMT
- Title: An Autonomous Large Language Model Agent for Chemical Literature Data
Mining
- Authors: Kexin Chen, Hanqun Cao, Junyou Li, Yuyang Du, Menghao Guo, Xin Zeng,
Lanqing Li, Jiezhong Qiu, Pheng Ann Heng, Guangyong Chen
- Abstract summary: We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
- Score: 60.85177362167166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chemical synthesis, which is crucial for advancing material synthesis and
drug discovery, impacts various sectors including environmental science and
healthcare. The rise of technology in chemistry has generated extensive
chemical data, challenging researchers to discern patterns and refine synthesis
processes. Artificial intelligence (AI) helps by analyzing data to optimize
synthesis and increase yields. However, AI faces challenges in processing
literature data due to the unstructured format and diverse writing style of
chemical literature. To overcome these difficulties, we introduce an end-to-end
AI agent framework capable of high-fidelity extraction from extensive chemical
literature. This AI agent employs large language models (LLMs) for prompt
generation and iterative optimization. It functions as a chemistry assistant,
automating data collection and analysis, thereby saving manpower and enhancing
performance. Our framework's efficacy is evaluated using accuracy, recall, and
F1 score of reaction condition data, and we compared our method with human
experts in terms of content correctness and time efficiency. The proposed
approach marks a significant advancement in automating chemical literature
extraction and demonstrates the potential for AI to revolutionize data
management and utilization in chemistry.
Related papers
- Intelligent Chemical Purification Technique Based on Machine Learning [5.023197681500998]
We present an innovative of artificial intelligence with column chromatography, aiming to resolve inefficiencies and standardize data collection in chemical separation and purification domain.
By developing an automated platform for precise data acquisition and employing advanced machine learning algorithms, we constructed predictive models to forecast key separation parameters.
A novel metric, separation probability ($S_p$), quantifies the likelihood of effective compound separation, validated through experimental verification.
arXiv Detail & Related papers (2024-04-14T01:44:58Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - AutoIE: An Automated Framework for Information Extraction from
Scientific Literature [6.235887933544583]
AutoIE is a framework designed to automate the extraction of vital data from scientific PDF documents.
Our SBERT model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE datasets.
This research paves the way for enhanced data management and interpretation in molecular sieve synthesis.
arXiv Detail & Related papers (2024-01-30T01:45:03Z) - Agent-based Learning of Materials Datasets from Scientific Literature [0.0]
We develop a chemist AI agent, powered by large language models (LLMs), to create structured datasets from natural language text.
Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles.
arXiv Detail & Related papers (2023-12-18T20:29:58Z) - AIMS-EREA -- A framework for AI-accelerated Innovation of Materials for
Sustainability -- for Environmental Remediation and Energy Applications [0.0]
AIMS-EREA is our novel framework to blend best of breed of Material Science theory with power of Generative AI.
This also helps to eliminate the possibility of production of hazardous residues and bye-products of the reactions.
arXiv Detail & Related papers (2023-11-18T12:35:45Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search [83.22850633478302]
Retrosynthetic planning identifies a series of reactions that can lead to the synthesis of a target product.
Existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality.
We propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently.
arXiv Detail & Related papers (2020-06-29T05:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.