Materials Map Integrating Experimental and Computational Data through Graph-Based Machine Learning for Enhanced Materials Discovery
- URL: http://arxiv.org/abs/2503.07378v4
- Date: Tue, 18 Mar 2025 04:43:10 GMT
- Title: Materials Map Integrating Experimental and Computational Data through Graph-Based Machine Learning for Enhanced Materials Discovery
- Authors: Yusuke Hashimoto, Xue Jia, Hao Li, Takaaki Tomai,
- Abstract summary: Materials informatics (MI) is expected to greatly streamline material discovery and development.<n>Data used for MI are obtained from both computational and experimental studies.<n>In this study, we use the obtained data to construct materials maps, which visualize the relation in the structural features of materials.
- Score: 5.06756291053173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Materials informatics (MI), which emerges from the integration of materials science and data science, is expected to greatly streamline material discovery and development. The data used for MI are obtained from both computational and experimental studies, while their integration remains challenging. In our previous study, we reported the integration of these datasets by applying a machine learning model that captures trends hidden in the experimental datasets to compositional data stored in the computational database. In this study, we use the obtained data to construct materials maps, which visualize the relation in the structural features of materials, aiming to support study by the experimental researchers. The map is constructed using a MatDeepLearn (MDL) framework, which implements the graph-based representation of material structures, deep learning, and dimensional reduction for map construction. We evaluate the obtained materials maps through statistical analysis and found that MDL using message passing neural network (MPNN) architecture enables efficient extraction of features that reflect the structural complexity of materials. Moreover, we found that this advantage does not necessarily translate into improved accuracy in the prediction of material properties. We assume this unexpected outcome to the high learning performance inherent in MPNN, which can contribute to the structuring of data points within the materials map.
Related papers
- Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery.
Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs)
By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z) - Emerging Microelectronic Materials by Design: Navigating Combinatorial Design Space with Scarce and Dispersed Data [42.45821602529994]
Computational modeling and machine learning methods are employed for the design of materials.<n>Physical mechanisms, cost of first-principles calculations, and the dispersity of data pose challenges to both physics-based and data-driven materials modeling.<n>We propose a framework that integrates data-driven and physics-based methods to address these challenges and accelerate materials design.
arXiv Detail & Related papers (2024-12-23T05:06:19Z) - DARWIN 1.5: Large Language Models as Materials Science Adapted Learners [46.7259033847682]
We propose DARWIN 1.5, the largest open-source large language model tailored for materials science.<n> DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery.<n>Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
arXiv Detail & Related papers (2024-12-16T16:51:27Z) - Foundation Model for Composite Materials and Microstructural Analysis [0.0]
We present a foundation model specifically designed for composite materials.<n>Our findings validate the feasibility and effectiveness of foundation models in composite materials.<n>This framework enables high-accuracy predictions even when experimental data are scarce.
arXiv Detail & Related papers (2024-11-10T19:06:25Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Images in Discrete Choice Modeling: Addressing Data Isomorphism in
Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning.
We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z) - Multimodal Foundation Models for Material Property Prediction and Discovery [7.167520424757711]
We introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials.
We demonstrate MultiMat's potential using data from the Materials Project database on multiple axes.
arXiv Detail & Related papers (2023-11-30T18:35:29Z) - Multimodal machine learning for materials science: composition-structure
bimodal learning for experimentally measured properties [4.495968252019426]
This paper introduces a novel approach to multimodal machine learning in materials science via composition-structure bimodal learning.
The proposed COmposition-Structure Bimodal Network (COSNet) is designed to enhance learning and predictions of experimentally measured materials properties that have incomplete structure information.
arXiv Detail & Related papers (2023-08-04T02:04:52Z) - Large Language Models as Master Key: Unlocking the Secrets of Materials
Science with GPT [9.33544942080883]
This article presents a new natural language processing (NLP) task called structured information inference (SII) to address the complexities of information extraction at the device level in materials science.
We accomplished this task by tuning GPT-3 on an existing perovskite solar cell FAIR dataset with 91.8% F1-score and extended the dataset with data published since its release.
We also designed experiments to predict the electrical performance of solar cells and design materials or devices with targeted parameters using large language models (LLMs)
arXiv Detail & Related papers (2023-04-05T04:01:52Z) - Deep autoencoders for physics-constrained data-driven nonlinear
materials modeling [0.6445605125467573]
Physics-constrained data-driven computing is an emerging computational paradigm that allows simulation of complex materials directly based on material database.
This paper introduces deep learning techniques under the data-driven framework to address these fundamental issues in nonlinear materials modeling.
The offline trained autoencoder and the discovered embedding space are then incorporated in the online data-driven computation.
arXiv Detail & Related papers (2022-09-03T20:13:47Z) - Interpretable Mixture of Experts [71.55701784196253]
Interpretable Mixture of Experts (IME) is an inherently-interpretable modeling framework.
IME is demonstrated to be more accurate than single interpretable models and perform comparably with existing state-of-the-art Deep Neural Networks (DNNs) in accuracy.
IME's explanations are compared to commonly-used post-hoc explanations methods through a user study.
arXiv Detail & Related papers (2022-06-05T06:40:15Z) - Audacity of huge: overcoming challenges of data scarcity and data
quality for machine learning in computational materials discovery [1.0036312061637764]
Machine learning (ML)-accelerated discovery requires large amounts of high-fidelity data to reveal predictive structure-property relationships.
For many properties of interest in materials discovery, the challenging nature and high cost of data generation has resulted in a data landscape that is scarcely populated and of dubious quality.
In the absence of manual curation, increasingly sophisticated natural language processing and automated image analysis are making it possible to learn structure-property relationships from the literature.
arXiv Detail & Related papers (2021-11-02T21:43:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.