Related papers: Are LLMs Ready for Real-World Materials Discovery?

Are LLMs Ready for Real-World Materials Discovery?

URL: http://arxiv.org/abs/2402.05200v1
Date: Wed, 7 Feb 2024 19:10:36 GMT
Title: Are LLMs Ready for Real-World Materials Discovery?
Authors: Santiago Miret, N M Anoop Krishnan
Abstract summary: Large Language Models (LLMs) create exciting possibilities for powerful language processing tools to accelerate research in materials science. While LLMs have great potential to accelerate materials understanding and discovery, they currently fall short in being practical materials science tools. We show relevant failure cases of LLMs in materials science that reveal current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge.
Score: 12.845153238975874
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) create exciting possibilities for powerful language processing tools to accelerate research in materials science. While LLMs have great potential to accelerate materials understanding and discovery, they currently fall short in being practical materials science tools. In this position paper, we show relevant failure cases of LLMs in materials science that reveal current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given those shortcomings, we outline a framework for developing Materials Science LLMs (MatSci-LLMs) that are grounded in materials science knowledge and hypothesis generation followed by hypothesis testing. The path to attaining performant MatSci-LLMs rests in large part on building high-quality, multi-modal datasets sourced from scientific literature where various information extraction challenges persist. As such, we describe key materials science information extraction challenges which need to be overcome in order to build large-scale, multi-modal datasets that capture valuable materials science knowledge. Finally, we outline a roadmap for applying future MatSci-LLMs for real-world materials discovery via: 1. Automated Knowledge Base Generation; 2. Automated In-Silico Material Design; and 3. MatSci-LLM Integrated Self-Driving Materials Laboratories.

Related papers

Can Theoretical Physics Research Benefit from Language Agents? [50.57057488167844]
Large Language Models (LLMs) are rapidly advancing across diverse domains, yet their application in theoretical physics research is not yet mature.<n>This position paper argues that LLM agents can potentially help accelerate theoretical, computational, and applied physics when properly integrated with domain knowledge and toolbox.<n>We envision future physics-specialized LLMs that could handle multimodal data, propose testable hypotheses, and design experiments.
arXiv Detail & Related papers (2025-06-06T16:20:06Z)
Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey [54.40267149907223]
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure.<n>The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges.<n>Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements.
arXiv Detail & Related papers (2025-05-22T08:33:21Z)
MatTools: Benchmarking Large Language Models for Materials Science Tools [5.876786336423598]
MatTools is built on two complementary components: a materials simulation tool question-answer benchmark and a real-world tool-usage benchmark.<n>The QA benchmark comprises 69, QA225 pairs that assess the ability of an LLM to understand materials science tools.<n>The real-world benchmark contains 49 tasks (138 subtasks) requiring the generation of functional Python code for materials property calculations.
arXiv Detail & Related papers (2025-05-16T04:43:05Z)
MatterChat: A Multi-Modal LLM for Material Science [33.185590479147805]
We introduce MatterChat, a versatile structure-aware multi-modal large language model. We show that MatterChat significantly improves performance in material property prediction and human-AI interaction. We also demonstrate its usefulness in applications such as more advanced scientific reasoning and step-by-step material synthesis.
arXiv Detail & Related papers (2025-02-18T18:19:36Z)
Extract Information from Hybrid Long Documents Leveraging LLMs: A Framework and Dataset [52.286323454512996]
Large Language Models (LLMs) can comprehend and analyze hybrid text, containing textual and tabular data. We propose an Automated Information Extraction framework (AIE) to enable LLMs to process the hybrid long documents (HLDs) and carry out experiments to analyse four important aspects of information extraction from HLDs. To address the issue of dataset scarcity in HLDs and support future work, we also propose the Financial Reports Numerical Extraction (FINE) dataset.
arXiv Detail & Related papers (2024-12-28T07:54:14Z)
DARWIN 1.5: Large Language Models as Materials Science Adapted Learners [46.7259033847682]
We propose DARWIN 1.5, the largest open-source large language model tailored for materials science. DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
arXiv Detail & Related papers (2024-12-16T16:51:27Z)
Foundation Model for Composite Materials and Microstructural Analysis [49.1574468325115]
We present a foundation model specifically designed for composite materials. Our model is pre-trained on a dataset of short-fiber composites to learn robust latent features. During transfer learning, the MMAE accurately predicts homogenized stiffness, with an R2 score reaching as high as 0.959 and consistently exceeding 0.91, even when trained on limited data.
arXiv Detail & Related papers (2024-11-10T19:06:25Z)
HoneyComb: A Flexible LLM-Based Agent System for Materials Science [31.173615509567885]
HoneyComb is the first large language model system specifically designed for materials science. MatSciKB is a curated, structured knowledge collection based on reliable literature. ToolHub employs an Inductive Tool Construction method to generate, decompose, and refine API tools for materials science.
arXiv Detail & Related papers (2024-08-29T15:38:40Z)
From Text to Insight: Large Language Models for Materials Science Data Extraction [4.08853418443192]
The vast majority of materials science knowledge exists in unstructured natural language. Structured data is crucial for innovative and systematic materials design. The advent of large language models (LLMs) represents a significant shift.
arXiv Detail & Related papers (2024-07-23T22:23:47Z)
LLMatDesign: Autonomous Materials Discovery with Large Language Models [5.481299708562135]
New materials can have significant scientific and technological implications. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials. We introduce LLMatDesign, a novel framework for interpretable materials design powered by large language models.
arXiv Detail & Related papers (2024-06-19T02:35:02Z)
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled. We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z)
LLMs Meet Multimodal Generation and Editing: A Survey [89.76691959033323]
This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio. We summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods. We dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction.
arXiv Detail & Related papers (2024-05-29T17:59:20Z)
Exploring the Capabilities of Large Multimodal Models on Dense Text [58.82262549456294]
We propose the DT-VQA dataset, with 170k question-answer pairs. In this paper, we conduct a comprehensive evaluation of GPT4V, Gemini, and various open-source LMMs. We find that even with automatically labeled training datasets, significant improvements in model performance can be achieved.
arXiv Detail & Related papers (2024-05-09T07:47:25Z)
Scientific Large Language Models: A Survey on Biological & Chemical Domains [47.97810890521825]
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration.
arXiv Detail & Related papers (2024-01-26T05:33:34Z)
Multimodal Learning for Materials [7.167520424757711]
We introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes.
arXiv Detail & Related papers (2023-11-30T18:35:29Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well. Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries. We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon [30.978561315637307]
Large-language models (LLMs) could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of the hackathon.
arXiv Detail & Related papers (2023-06-09T22:22:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.