What do Large Language Models know about materials?
- URL: http://arxiv.org/abs/2507.14586v1
- Date: Sat, 19 Jul 2025 12:02:08 GMT
- Title: What do Large Language Models know about materials?
- Authors: Adrian Ehrenhofer, Thomas Wallmersperger, Gianaurelio Cuniberti,
- Abstract summary: Large Language Models (LLMs) are increasingly applied in the fields of mechanical engineering and materials science.<n>We highlight the role of vocabulary and tokenization for the uniqueness of material fingerprints.<n>This leads to a material knowledge benchmark for an informed choice, for which steps in the PSPP chain LLMs are applicable.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly applied in the fields of mechanical engineering and materials science. As models that establish connections through the interface of language, LLMs can be applied for step-wise reasoning through the Processing-Structure-Property-Performance chain of material science and engineering. Current LLMs are built for adequately representing a dataset, which is the most part of the accessible internet. However, the internet mostly contains non-scientific content. If LLMs should be applied for engineering purposes, it is valuable to investigate models for their intrinsic knowledge -- here: the capacity to generate correct information about materials. In the current work, for the example of the Periodic Table of Elements, we highlight the role of vocabulary and tokenization for the uniqueness of material fingerprints, and the LLMs' capabilities of generating factually correct output of different state-of-the-art open models. This leads to a material knowledge benchmark for an informed choice, for which steps in the PSPP chain LLMs are applicable, and where specialized models are required.
Related papers
- MatterChat: A Multi-Modal LLM for Material Science [33.185590479147805]
We introduce MatterChat, a versatile structure-aware multi-modal large language model.<n>We show that MatterChat significantly improves performance in material property prediction and human-AI interaction.<n>We also demonstrate its usefulness in applications such as more advanced scientific reasoning and step-by-step material synthesis.
arXiv Detail & Related papers (2025-02-18T18:19:36Z) - Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages.<n>For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively.<n>We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z) - Enhancing Large Language Models with Domain-Specific Knowledge: The Case in Topological Materials [4.654635844923322]
Large language models (LLMs) have demonstrated impressive performance in the text generation task.<n>We develop a specialized dialogue system for topological materials called TopoChat.<n>TopoChat exhibits superior performance in structural and property querying, material recommendation, and complex relational reasoning.
arXiv Detail & Related papers (2024-09-10T06:01:16Z) - From Text to Insight: Large Language Models for Materials Science Data Extraction [4.08853418443192]
The vast majority of materials science knowledge exists in unstructured natural language.<n>Structured data is crucial for innovative and systematic materials design.<n>The advent of large language models (LLMs) represents a significant shift.
arXiv Detail & Related papers (2024-07-23T22:23:47Z) - LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation [0.0]
Large Language Models (LLMs) inherently lack long-term memory, making it a nontrivial, ad hoc, and inevitably biased task to fine-tune them on domain-specific literature and data.
Here we introduce LLaMP, a framework of hierarchical reasoning-and-acting (RAG) agents that can interact with computational and experimental data.
Without fine-tuning, LLaMP demonstrates strong tool usage ability to comprehend and integrate various modalities of materials science concepts.
arXiv Detail & Related papers (2024-01-30T18:37:45Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - The Quo Vadis of the Relationship between Language and Large Language
Models [3.10770247120758]
Large Language Models (LLMs) have come to encourage the adoption of LLMs as scientific models of language.
We identify the most important theoretical and empirical risks brought about by the adoption of scientific models that lack transparency.
We conclude that, at their current stage of development, LLMs hardly offer any explanations for language.
arXiv Detail & Related papers (2023-10-17T10:54:24Z) - 14 Examples of How LLMs Can Transform Materials Science and Chemistry: A
Reflection on a Large Language Model Hackathon [30.978561315637307]
Large-language models (LLMs) could be useful in chemistry and materials science.
To explore these possibilities, we organized a hackathon.
This article chronicles the projects built as part of the hackathon.
arXiv Detail & Related papers (2023-06-09T22:22:02Z) - Augmented Large Language Models with Parametric Knowledge Guiding [72.71468058502228]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities.
Their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data.
We propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge.
arXiv Detail & Related papers (2023-05-08T15:05:16Z) - Empowering Language Models with Knowledge Graph Reasoning for Question
Answering [117.79170629640525]
We propose knOwledge REasOning empowered Language Model (OREO-LM)
OREO-LM consists of a novel Knowledge Interaction Layer that can be flexibly plugged into existing Transformer-based LMs.
We show significant performance gain, achieving state-of-art results in the Closed-Book setting.
arXiv Detail & Related papers (2022-11-15T18:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.