Related papers: A New Workflow for Materials Discovery Bridging the Gap Between Experimental Databases and Graph Neural Networks

A New Workflow for Materials Discovery Bridging the Gap Between Experimental Databases and Graph Neural Networks

URL: http://arxiv.org/abs/2602.00756v1
Date: Sat, 31 Jan 2026 14:44:02 GMT
Title: A New Workflow for Materials Discovery Bridging the Gap Between Experimental Databases and Graph Neural Networks
Authors: Brandon Schoener, Yuting Hu, Pasit Wanlapha, Akshay Rengarajan, Ian Moog, Michael Wang, Peihong Zhang, Jinjun Xiong, Hao Zeng,
Abstract summary: We propose an alignment process between experimental databases and Crystallographic Information Files (CIF) from the Inorganic Crystal Structure Database (ICSD)<n>Our approach enables the creation of a database that can fully leverage state-of-the-art model architectures for material property prediction.<n>We demonstrate significant improvements in both Mean Absolute Error (MAE) and Correct Classification Rate ( CCR) in predicting the ordering temperatures and magnetic ground states of magnetic materials.
Score: 10.116093920635583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Incorporating Machine Learning (ML) into material property prediction has become a crucial step in accelerating materials discovery. A key challenge is the severe lack of training data, as many properties are too complicated to calculate with high-throughput first principles techniques. To address this, recent research has created experimental databases from information extracted from scientific literature. However, most existing experimental databases do not provide full atomic coordinate information, which prevents them from supporting advanced ML architectures such as Graph Neural Networks (GNNs). In this work, we propose to bridge this gap through an alignment process between experimental databases and Crystallographic Information Files (CIF) from the Inorganic Crystal Structure Database (ICSD). Our approach enables the creation of a database that can fully leverage state-of-the-art model architectures for material property prediction. It also opens the door to utilizing transfer learning to improve prediction accuracy. To validate our approach, we align NEMAD with the ICSD and compare models trained on the resulting database to those trained on NEMAD originally. We demonstrate significant improvements in both Mean Absolute Error (MAE) and Correct Classification Rate (CCR) in predicting the ordering temperatures and magnetic ground states of magnetic materials, respectively.

Related papers

A Roadmap for Applying Graph Neural Networks to Numerical Data: Insights from Cementitious Materials [5.565428903960444]
This work is among the first few studies to implement Graph neural network (GNN) models to design concrete.<n>GNN is capable of learning from data structured as graphs, capturing relationships through irregular or topology-dependent connections.<n>The proposed framework establishes a strong foundation for future multi-modal and physics-informed GNN models.
arXiv Detail & Related papers (2025-12-16T19:17:05Z)
Fusing CFD and measurement data using transfer learning [49.1574468325115]
We introduce a non-linear method based on neural networks combining simulation and measurement data via transfer learning.<n>In a first step, the neural network is trained on simulation data to learn spatial features of the distributed quantities.<n>The second step involves transfer learning on the measurement data to correct for systematic errors between simulation and measurement by only re-training a small subset of the entire neural network model.
arXiv Detail & Related papers (2025-07-28T07:21:46Z)
AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction [48.72179728638418]
We propose Adaptive Meshing By Expert Reconstruction (AMBER), a supervised learning approach to mesh adaptation.<n>AMBER iteratively predicts the sizing field, and uses this prediction to produce a new intermediate mesh using an out-of-the-box mesh generator.<n>We evaluate AMBER on 2D and 3D geometries, datasets including classical physics problems, mechanical components, and real-world industrial designs with human expert meshes.
arXiv Detail & Related papers (2025-05-29T17:10:44Z)
Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery.<n>Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs)<n>By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z)
A Materials Map Integrating Experimental and Computational Data via Graph-Based Machine Learning for Enhanced Materials Discovery [5.06756291053173]
Materials informatics (MI) is expected to significantly accelerate material development and discovery.<n>Data used in MI are derived from both computational and experimental studies.<n>In this study, we use the obtained datasets to construct materials maps, which visualize the relationships between material properties and structural features.
arXiv Detail & Related papers (2025-03-10T14:31:34Z)
Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining.<n>We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z)
RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases.<n>RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority tasks, with improvements of up to 25%.
arXiv Detail & Related papers (2025-02-10T18:58:40Z)
Transfer Learning for Deep Learning-based Prediction of Lattice Thermal Conductivity [0.0]
We study the impact of transfer learning on the precision and generalizability of a deep learning model (ParAIsite)<n>We show that a much greater improvement is obtained when first fine-tuning it on a large datasets of low-quality approximations of lattice thermal conductivity (LTC)<n>The promising results pave the way towards a greater ability to explore large databases in search of low thermal conductivity materials.
arXiv Detail & Related papers (2024-11-27T11:57:58Z)
MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data [22.262191225577244]
We explore whether a similar approach can be applied to scientific foundation models (SFMs) We collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries. We provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data.
arXiv Detail & Related papers (2024-10-09T00:52:00Z)
Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium. Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
Machine learning for metal additive manufacturing: Predicting temperature and melt pool fluid dynamics using physics-informed neural networks [0.0]
We propose a physics-informed neural network (PINN) framework that fuses data and first physical principles. This is the first application of PINN to three dimensional AM processes modeling. The PINN can accurately predict the temperature and melt pool dynamics during metal AM processes with only a moderate amount of labeled data-sets.
arXiv Detail & Related papers (2020-07-28T20:34:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.