Related papers: From Text to Insight: Large Language Models for Materials Science Data Extraction

From Text to Insight: Large Language Models for Materials Science Data Extraction

URL: http://arxiv.org/abs/2407.16867v1
Date: Tue, 23 Jul 2024 22:23:47 GMT
Title: From Text to Insight: Large Language Models for Materials Science Data Extraction
Authors: Mara Schilling-Wilhelmi, Martiño Ríos-García, Sherjeel Shabih, María Victoria Gil, Santiago Miret, Christoph T. Koch, José A. Márquez, Kevin Maik Jablonka,
Abstract summary: The vast majority of materials science knowledge exists in unstructured natural language. Structured data is crucial for innovative and systematic materials design. The advent of large language models (LLMs) represents a significant shift.
Score: 4.08853418443192
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The vast majority of materials science knowledge exists in unstructured natural language, yet structured data is crucial for innovative and systematic materials design. Traditionally, the field has relied on manual curation and partial automation for data extraction for specific use cases. The advent of large language models (LLMs) represents a significant shift, potentially enabling efficient extraction of structured, actionable data from unstructured text by non-experts. While applying LLMs to materials science data extraction presents unique challenges, domain knowledge offers opportunities to guide and validate LLM outputs. This review provides a comprehensive overview of LLM-based structured data extraction in materials science, synthesizing current knowledge and outlining future directions. We address the lack of standardized guidelines and present frameworks for leveraging the synergy between LLMs and materials science expertise. This work serves as a foundational resource for researchers aiming to harness LLMs for data-driven materials research. The insights presented here could significantly enhance how researchers across disciplines access and utilize scientific information, potentially accelerating the development of novel materials for critical societal needs.

Related papers

Context-Aware Scientific Knowledge Extraction on Linked Open Data using Large Language Models [0.0]
This paper introduces WISE (Workflow for Intelligent Scientific Knowledge Extraction), a system to extract, refine, and rank query-specific knowledge.<n>WISE delivers detailed, organized answers by systematically exploring and synthesizing knowledge from diverse sources.
arXiv Detail & Related papers (2025-06-21T04:22:34Z)
Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey [54.40267149907223]
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure.<n>The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges.<n>Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements.
arXiv Detail & Related papers (2025-05-22T08:33:21Z)
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge [6.500470477634259]
Our work aims to support the materials science community by providing a practical, data-driven resource. We have curated a comprehensive dataset of 17K expert-verified synthesis recipes from open-access literature. AlchemicalBench offers an end-to-end framework that supports research in large language models applied to synthesis prediction.
arXiv Detail & Related papers (2025-02-23T06:16:23Z)
Foundational Large Language Models for Materials Research [22.77591279242839]
Large Language Models (LLMs) offer opportunities to accelerate materials research through automated analysis and prediction. Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models. We demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities.
arXiv Detail & Related papers (2024-12-12T18:46:38Z)
HoneyComb: A Flexible LLM-Based Agent System for Materials Science [31.173615509567885]
HoneyComb is the first large language model system specifically designed for materials science. MatSciKB is a curated, structured knowledge collection based on reliable literature. ToolHub employs an Inductive Tool Construction method to generate, decompose, and refine API tools for materials science.
arXiv Detail & Related papers (2024-08-29T15:38:40Z)
Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models [3.0061386772253784]
Data-driven research in Additive Manufacturing (AM) has gained significant success in recent years. This has led to a plethora of scientific literature to emerge. It requires substantial effort and time to extract scientific information from these works. We propose a framework that enables collaboration between AM and AI experts to continuously extract scientific information from data-driven AM literature.
arXiv Detail & Related papers (2024-07-26T15:43:52Z)
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z)
Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks. We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement. We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis. It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z)
Quantitative knowledge retrieval from large language models [4.155711233354597]
Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences. This paper explores the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid data analysis tasks.
arXiv Detail & Related papers (2024-02-12T16:32:37Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
Agent-based Learning of Materials Datasets from Scientific Literature [0.0]
We develop a chemist AI agent, powered by large language models (LLMs), to create structured datasets from natural language text. Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles.
arXiv Detail & Related papers (2023-12-18T20:29:58Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated. We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z)
Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT [9.33544942080883]
This article presents a new natural language processing (NLP) task called structured information inference (SII) to address the complexities of information extraction at the device level in materials science. We accomplished this task by tuning GPT-3 on an existing perovskite solar cell FAIR dataset with 91.8% F1-score and extended the dataset with data published since its release. We also designed experiments to predict the electrical performance of solar cells and design materials or devices with targeted parameters using large language models (LLMs)
arXiv Detail & Related papers (2023-04-05T04:01:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.