Text-to-Battery Recipe: A language modeling-based protocol for automatic battery recipe extraction and retrieval
- URL: http://arxiv.org/abs/2407.15459v1
- Date: Mon, 22 Jul 2024 08:15:02 GMT
- Title: Text-to-Battery Recipe: A language modeling-based protocol for automatic battery recipe extraction and retrieval
- Authors: Daeun Lee, Jaewoong Choi, Hiroshi Mizuseki, Byungju Lee,
- Abstract summary: We propose a language modeling-based protocol, Text-to-Battery Recipe (T2BR), for the automatic extraction of end-to-end battery recipes.
The proposed protocol will significantly accelerate the review of battery material literature and catalyze innovations in battery design and development.
- Score: 5.3498018871204245
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have increasingly applied natural language processing (NLP) to automatically extract experimental research data from the extensive battery materials literature. Despite the complex process involved in battery manufacturing -- from material synthesis to cell assembly -- there has been no comprehensive study systematically organizing this information. In response, we propose a language modeling-based protocol, Text-to-Battery Recipe (T2BR), for the automatic extraction of end-to-end battery recipes, validated using a case study on batteries containing LiFePO4 cathode material. We report machine learning-based paper filtering models, screening 2,174 relevant papers from the keyword-based search results, and unsupervised topic models to identify 2,876 paragraphs related to cathode synthesis and 2,958 paragraphs related to cell assembly. Then, focusing on the two topics, two deep learning-based named entity recognition models are developed to extract a total of 30 entities -- including precursors, active materials, and synthesis methods -- achieving F1 scores of 88.18% and 94.61%. The accurate extraction of entities enables the systematic generation of 165 end-toend recipes of LiFePO4 batteries. Our protocol and results offer valuable insights into specific trends, such as associations between precursor materials and synthesis methods, or combinations between different precursor materials. We anticipate that our findings will serve as a foundational knowledge base for facilitating battery-recipe information retrieval. The proposed protocol will significantly accelerate the review of battery material literature and catalyze innovations in battery design and development.
Related papers
- Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization [47.97016882216093]
Large language models (LLMs) leverage chain-of-thought (CoT) techniques to tackle complex problems.<n>We introduce ChatBattery, a novel agentic framework that integrates domain knowledge to steer LLMs toward more effective reasoning in materials design.<n>We successfully identify, synthesize, and characterize three novel lithium-ion battery cathode materials, which achieve practical capacity improvements of 28.8%, 25.2%, and 18.5%, respectively.
arXiv Detail & Related papers (2025-07-21T23:46:11Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM [7.808231572590279]
We propose a novel approach to achieve the same results from unannotated full documents using general large language models (LLMs) with lower hardware and labor costs.<n>Our approach combines two major stages: named entity recognition (NER) and relation extraction (RE)<n>To enhance the effectiveness of prompt, we propose a five-part template structure and a scenario-based prompt design principles.
arXiv Detail & Related papers (2025-05-02T07:33:20Z) - Demonstrating Linked Battery Data To Accelerate Knowledge Flow in Battery Science [0.5804487044220691]
Batteries are pivotal for transitioning to a climate-friendly future, leading to a surge in battery research.
Scopus lists 14,388 papers that mention "lithium-ion battery" in 2023 alone, making it infeasible for individuals to keep up.
This paper discusses strategies based on structured, semantic, and linked data to manage this information overload.
arXiv Detail & Related papers (2024-10-16T14:12:41Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - PINN surrogate of Li-ion battery models for parameter inference. Part I: Implementation and multi-fidelity hierarchies for the single-particle model [0.0]
This manuscript is the first of a two-part series that introduces PINN surrogates of Li-ion battery models for parameter inference.
A multi-fidelity hierarchical training, where several neural nets are trained with multiple physics-loss fidelities is shown to significantly improve the surrogate accuracy.
arXiv Detail & Related papers (2023-12-28T19:09:56Z) - BatteryML:An Open-source platform for Machine Learning on Battery Degradation [15.469939183346467]
We present BatteryML - a one-step, all-encompass, and open-source platform designed to unify data preprocessing, feature extraction, and the implementation of both traditional and state-of-the-art models.
This streamlined approach promises to enhance the practicality and efficiency of research applications.
arXiv Detail & Related papers (2023-10-23T08:51:05Z) - Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from
Literature with GPT-3 [52.59930033705221]
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
arXiv Detail & Related papers (2023-04-26T22:21:33Z) - Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models [5.748877272090607]
Large language models (LLMs) are transforming the way humans interact with text.
We demonstrate a simple and efficient method for extracting materials data from full-text research papers.
This approach requires minimal to no coding or prior knowledge about the extracted property.
It offers high recall and nearly perfect precision in the resulting database.
arXiv Detail & Related papers (2023-02-09T19:56:37Z) - Structured information extraction from complex scientific text with
fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction.
The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts.
This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - Commonsense Evidence Generation and Injection in Reading Comprehension [57.31927095547153]
We propose a Commonsense Evidence Generation and Injection framework in reading comprehension, named CEGI.
The framework injects two kinds of auxiliary commonsense evidence into comprehensive reading to equip the machine with the ability of rational thinking.
Experiments on the CosmosQA dataset demonstrate that the proposed CEGI model outperforms the current state-of-the-art approaches.
arXiv Detail & Related papers (2020-05-11T16:31:08Z) - Annotating and Extracting Synthesis Process of All-Solid-State Batteries
from Scientific Literature [10.443499579567069]
We present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system.
We define the representation of the synthesis processes using flow graphs, and create a corpus from the experimental sections of 243 papers.
The automated machine-reading system is developed by a deep learning-based sequence tagger and simple rule-based relation extractor.
arXiv Detail & Related papers (2020-02-18T02:30:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.