"DIVE" into Hydrogen Storage Materials Discovery with AI Agents
- URL: http://arxiv.org/abs/2508.13251v2
- Date: Thu, 25 Sep 2025 02:01:31 GMT
- Title: "DIVE" into Hydrogen Storage Materials Discovery with AI Agents
- Authors: Di Zhang, Xue Jia, Tran Ba Hung, Seong Hoon Jang, Linda Zhang, Ryuhei Sato, Yusuke Hashimoto, Toyoto Sato, Kiyoe Konno, Shin-ichi Orimo, Hao Li,
- Abstract summary: Data-driven artificial intelligence (AI) approaches are transforming the discovery of new materials.<n>We present the Descriptive Interpretation of Visual Expression (DIVE) multi-agent workflow, which reads and organizes experimental data from graphical elements in scientific literatures.<n>Building on a curated database of over 30,000 entries from 4,000 publications, we establish a rapid inverse design workflow capable of identifying previously unreported hydrogen storage compositions in two minutes.
- Score: 8.774584882332526
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data-driven artificial intelligence (AI) approaches are fundamentally transforming the discovery of new materials. Despite the unprecedented availability of materials data in the scientific literature, much of this information remains trapped in unstructured figures and tables, hindering the construction of large language model (LLM)-based AI agent for automated materials design. Here, we present the Descriptive Interpretation of Visual Expression (DIVE) multi-agent workflow, which systematically reads and organizes experimental data from graphical elements in scientific literatures. We focus on solid-state hydrogen storage materials-a class of materials central to future clean-energy technologies and demonstrate that DIVE markedly improves the accuracy and coverage of data extraction compared to the direct extraction by multimodal models, with gains of 10-15% over commercial models and over 30% relative to open-source models. Building on a curated database of over 30,000 entries from 4,000 publications, we establish a rapid inverse design workflow capable of identifying previously unreported hydrogen storage compositions in two minutes. The proposed AI workflow and agent design are broadly transferable across diverse materials, providing a paradigm for AI-driven materials discovery.
Related papers
- From Natural Language to Materials Discovery:The Materials Knowledge Navigation Agent [11.478292682955669]
We introduce the Materials Knowledge Navigation Agent (MKNA), a language-driven system that translates scientific intent into executable actions.<n>MKNA autonomously extracts quantitative thresholds and chemically meaningful design motifs from literature and database evidence.<n>It proposes thermodynamically stable, previously unreported Be-C-rich compounds that populate the sparsely explored 1500-1700 K regime.
arXiv Detail & Related papers (2026-02-11T18:34:24Z) - Towards Agentic Intelligence for Materials Science [73.4576385477731]
This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining to goal-conditioned agents interfacing with simulation and experimental platforms.<n>To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science.
arXiv Detail & Related papers (2026-01-29T23:48:43Z) - OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive [50.468138755368805]
Opioid crisis represents a significant moment in public health.<n>Data and documents disclosed in the UCSF-JHU Opioid Industry Documents Archive (OIDA)<n>In this paper, we tackle this challenge by organizing the original dataset according to document attributes.
arXiv Detail & Related papers (2025-11-13T03:27:32Z) - LeMat-Synth: a multi-modal toolbox to curate broad synthesis procedure databases from scientific literature [60.879220305044726]
We propose a multi-modal toolbox that employs large language models (LLMs) and vision language models (VLMs) to automatically extract and organize synthesis procedures and performance data.<n>We curated 81k open-access papers, yielding LeMat- Synth (v 1.0): a dataset containing synthesis procedures spanning 35 synthesis methods and 16 material classes.<n>We release a modular, open-source library designed to support community-driven extension to new corpora and synthesis domains.
arXiv Detail & Related papers (2025-10-28T17:58:18Z) - Artificial Intelligence and Generative Models for Materials Discovery -- A Review [0.0]
Review aims to discuss different principles of AI-driven generative models that are applicable for materials discovery.<n>We will also highlight specific applications of generative models in designing new catalysts, semiconductors, polymers, or crystals.
arXiv Detail & Related papers (2025-08-05T09:56:27Z) - Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey [54.40267149907223]
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure.<n>The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges.<n>Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements.
arXiv Detail & Related papers (2025-05-22T08:33:21Z) - A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset.<n>Biomedica contains over 6 million scientific articles and 24 million image-text pairs.<n>We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z) - Towards an automated workflow in materials science for combining multi-modal simulative and experimental information using data mining and large language models [0.0]
This manuscript showcases an automated workflow, which unravels the encoded information from scientific literature to a machine-readable database.<n>Ultimately, a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) enables a fast and efficient question answering chat bot.
arXiv Detail & Related papers (2025-02-18T16:24:46Z) - DARWIN 1.5: Large Language Models as Materials Science Adapted Learners [46.7259033847682]
We propose DARWIN 1.5, the largest open-source large language model tailored for materials science.<n> DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery.<n>Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
arXiv Detail & Related papers (2024-12-16T16:51:27Z) - Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design [0.0]
We show that chemistry foundation models can serve as a basis for enabling structure-focused, semantic chemistry information retrieval.<n>We also show the use of chemistry foundation models in conjunction with multi-modal models such as OpenCLIP.
arXiv Detail & Related papers (2024-08-21T17:25:45Z) - Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing [5.527358421206627]
We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs.
Our approach demonstrates a potential reduction in discovery time by approximately 75 %, equivalent to a 15 year acceleration in material innovation.
arXiv Detail & Related papers (2024-02-29T18:54:46Z) - ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining [56.15126714863963]
ChemMiner is an end-to-end framework for extracting chemical data from literature.<n>ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation.<n> Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores.
arXiv Detail & Related papers (2024-02-20T13:21:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.