From Natural Language to Materials Discovery:The Materials Knowledge Navigation Agent
- URL: http://arxiv.org/abs/2602.11123v1
- Date: Wed, 11 Feb 2026 18:34:24 GMT
- Title: From Natural Language to Materials Discovery:The Materials Knowledge Navigation Agent
- Authors: Genmao Zhuang, Amir Barati Farimani,
- Abstract summary: We introduce the Materials Knowledge Navigation Agent (MKNA), a language-driven system that translates scientific intent into executable actions.<n>MKNA autonomously extracts quantitative thresholds and chemically meaningful design motifs from literature and database evidence.<n>It proposes thermodynamically stable, previously unreported Be-C-rich compounds that populate the sparsely explored 1500-1700 K regime.
- Score: 11.478292682955669
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accelerating the discovery of high-performance materials remains a central challenge across energy, electronics, and aerospace technologies, where traditional workflows depend heavily on expert intuition and computationally expensive simulations. Here we introduce the Materials Knowledge Navigation Agent (MKNA), a language-driven system that translates natural-language scientific intent into executable actions for database retrieval, property prediction, structure generation, and stability evaluation. Beyond automating tool invocation, MKNA autonomously extracts quantitative thresholds and chemically meaningful design motifs from literature and database evidence, enabling data-grounded hypothesis formation. Applied to the search for high-Debye-temperature ceramics, the agent identifies a literature-supported screening criterion (Theta_D > 800 K), rediscovers canonical ultra-stiff materials such as diamond, SiC, SiN, and BeO, and proposes thermodynamically stable, previously unreported Be-C-rich compounds that populate the sparsely explored 1500-1700 K regime. These results demonstrate that MKNA not only finds stable candidates but also reconstructs interpretable design heuristics, establishing a generalizable platform for autonomous, language-guided materials exploration.
Related papers
- Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z) - Language Native Lightly Structured Databases for Large Language Model Driven Composite Materials Research [6.31777560888658]
Preparation procedures of materials are often embedded narratively in experimental protocols, research articles, patents, and laboratory notes.<n>We reformulate this challenge into a text-reasoning problem through a framework centered on a text-first, lightly structured materials database.<n>We show how language-native data combined with LLM-based reasoning can significantly accelerate practical material preparation.
arXiv Detail & Related papers (2025-09-07T15:15:55Z) - "DIVE" into Hydrogen Storage Materials Discovery with AI Agents [8.774584882332526]
Data-driven artificial intelligence (AI) approaches are transforming the discovery of new materials.<n>We present the Descriptive Interpretation of Visual Expression (DIVE) multi-agent workflow, which reads and organizes experimental data from graphical elements in scientific literatures.<n>Building on a curated database of over 30,000 entries from 4,000 publications, we establish a rapid inverse design workflow capable of identifying previously unreported hydrogen storage compositions in two minutes.
arXiv Detail & Related papers (2025-08-18T14:30:18Z) - Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey [54.40267149907223]
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure.<n>The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges.<n>Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements.
arXiv Detail & Related papers (2025-05-22T08:33:21Z) - Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery.<n>Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs)<n>By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z) - Towards an automated workflow in materials science for combining multi-modal simulative and experimental information using data mining and large language models [0.0]
This manuscript showcases an automated workflow, which unravels the encoded information from scientific literature to a machine-readable database.<n>Ultimately, a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) enables a fast and efficient question answering chat bot.
arXiv Detail & Related papers (2025-02-18T16:24:46Z) - DARWIN 1.5: Large Language Models as Materials Science Adapted Learners [46.7259033847682]
We propose DARWIN 1.5, the largest open-source large language model tailored for materials science.<n> DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery.<n>Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
arXiv Detail & Related papers (2024-12-16T16:51:27Z) - Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [53.81190146434045]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.<n>We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.<n>Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - Agent-based Learning of Materials Datasets from Scientific Literature [0.0]
We develop a chemist AI agent, powered by large language models (LLMs), to create structured datasets from natural language text.
Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles.
arXiv Detail & Related papers (2023-12-18T20:29:58Z) - Scalable Diffusion for Materials Generation [99.71001883652211]
We develop a unified crystal representation that can represent any crystal structure (UniMat)
UniMat can generate high fidelity crystal structures from larger and more complex chemical systems.
We propose additional metrics for evaluating generative models of materials.
arXiv Detail & Related papers (2023-10-18T15:49:39Z) - 1.5 million materials narratives generated by chatbots [25.125848842769464]
We have generated a dataset of 1,494,017 natural language-material paragraphs based on combined OQMD, Materials Project, JARVIS, COD and AFLOW2 databases.
The generated text narratives were then polled and scored by both human experts and ChatGPT-4, based on three rubrics: technical accuracy, language and structure, and relevance and depth of content.
arXiv Detail & Related papers (2023-08-25T22:00:53Z) - Large Language Models as Master Key: Unlocking the Secrets of Materials
Science with GPT [9.33544942080883]
This article presents a new natural language processing (NLP) task called structured information inference (SII) to address the complexities of information extraction at the device level in materials science.
We accomplished this task by tuning GPT-3 on an existing perovskite solar cell FAIR dataset with 91.8% F1-score and extended the dataset with data published since its release.
We also designed experiments to predict the electrical performance of solar cells and design materials or devices with targeted parameters using large language models (LLMs)
arXiv Detail & Related papers (2023-04-05T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.