Related papers: Multimodal machine learning for materials science: composition-structure bimodal learning for experimentally measured properties

Multimodal machine learning for materials science: composition-structure bimodal learning for experimentally measured properties

URL: http://arxiv.org/abs/2309.04478v1
Date: Fri, 4 Aug 2023 02:04:52 GMT
Title: Multimodal machine learning for materials science: composition-structure bimodal learning for experimentally measured properties
Authors: Sheng Gong, Shuo Wang, Taishan Zhu, Yang Shao-Horn, and Jeffrey C. Grossman
Abstract summary: This paper introduces a novel approach to multimodal machine learning in materials science via composition-structure bimodal learning. The proposed COmposition-Structure Bimodal Network (COSNet) is designed to enhance learning and predictions of experimentally measured materials properties that have incomplete structure information.
Score: 4.495968252019426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The widespread application of multimodal machine learning models like GPT-4 has revolutionized various research fields including computer vision and natural language processing. However, its implementation in materials informatics remains underexplored, despite the presence of materials data across diverse modalities, such as composition and structure. The effectiveness of machine learning models trained on large calculated datasets depends on the accuracy of calculations, while experimental datasets often have limited data availability and incomplete information. This paper introduces a novel approach to multimodal machine learning in materials science via composition-structure bimodal learning. The proposed COmposition-Structure Bimodal Network (COSNet) is designed to enhance learning and predictions of experimentally measured materials properties that have incomplete structure information. Bimodal learning significantly reduces prediction errors across distinct materials properties including Li conductivity in solid electrolyte, band gap, refractive index, dielectric constant, energy, and magnetic moment, surpassing composition-only learning methods. Furthermore, we identified that data augmentation based on modal availability plays a pivotal role in the success of bimodal learning.

Related papers

Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery. Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs) By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z)
A Materials Map Integrating Experimental and Computational Data via Graph-Based Machine Learning for Enhanced Materials Discovery [5.06756291053173]
Materials informatics (MI) is expected to significantly accelerate material development and discovery. Data used in MI are derived from both computational and experimental studies. In this study, we use the obtained datasets to construct materials maps, which visualize the relationships between material properties and structural features.
arXiv Detail & Related papers (2025-03-10T14:31:34Z)
Machine learning of microstructure--property relationships in materials with robust features from foundational vision transformers [0.0]
Machine learning of microstructure--property relationships from data is an emerging approach in computational materials science. We propose utilizing pre-trained foundational vision transformers for the extraction of task-agnostic microstructure features.
arXiv Detail & Related papers (2025-01-28T17:06:47Z)
DARWIN 1.5: Large Language Models as Materials Science Adapted Learners [46.7259033847682]
We propose DARWIN 1.5, the largest open-source large language model tailored for materials science.<n> DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery.<n>Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
arXiv Detail & Related papers (2024-12-16T16:51:27Z)
Foundation Model for Composite Materials and Microstructural Analysis [49.1574468325115]
We present a foundation model specifically designed for composite materials. Our model is pre-trained on a dataset of short-fiber composites to learn robust latent features. During transfer learning, the MMAE accurately predicts homogenized stiffness, with an R2 score reaching as high as 0.959 and consistently exceeding 0.91, even when trained on limited data.
arXiv Detail & Related papers (2024-11-10T19:06:25Z)
Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches. We demonstrate that the more accurate energy data can improve the accuracy of structure prediction. We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z)
Multi-Task Multi-Fidelity Learning of Properties for Energetic Materials [34.8008617873679]
We find that multi-task neural networks can learn from multi-modal data and outperform single-task models trained for specific properties. As expected, the improvement is more significant for data-scarce properties. This approach is widely applicable to fields outside energetic materials.
arXiv Detail & Related papers (2024-08-21T12:54:26Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations. We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models. The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
Multimodal Learning for Materials [7.167520424757711]
We introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes.
arXiv Detail & Related papers (2023-11-30T18:35:29Z)
Compositional Representation of Polymorphic Crystalline Materials [56.80318252233511]
We introduce PCRL, a novel approach that employs probabilistic modeling of composition to capture the diverse polymorphs from available structural information. Extensive evaluations on sixteen datasets demonstrate the effectiveness of PCRL in learning compositional representation.
arXiv Detail & Related papers (2023-11-17T20:34:28Z)
Efficient Surrogate Models for Materials Science Simulations: Machine Learning-based Prediction of Microstructure Properties [0.0]
Several machine learning algorithms have been applied in these scientific fields to enhance and accelerate simulation models or as surrogate models. We develop and investigate the applications of six machine learning techniques based on two different datasets from the domain of materials science.
arXiv Detail & Related papers (2023-09-01T07:29:44Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models. The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z)
Benchmarking Active Learning Strategies for Materials Optimization and Discovery [17.8738267360992]
We present a reference dataset to benchmark active learning strategies in the form of various acquisition functions. We discuss the relationship between algorithm performance, materials search space, complexity, and the incorporation of prior knowledge.
arXiv Detail & Related papers (2022-04-12T14:27:33Z)
Intelligent multiscale simulation based on process-guided composite database [0.0]
We present an integrated data-driven modeling framework based on process modeling, material homogenization, and machine learning. We are interested in the injection-molded short fiber reinforced composites, which have been identified as key material systems in automotive, aerospace, and electronics industries.
arXiv Detail & Related papers (2020-03-20T20:39:19Z)
Multilinear Compressive Learning with Prior Knowledge [106.12874293597754]
Multilinear Compressive Learning (MCL) framework combines Multilinear Compressive Sensing and Machine Learning into an end-to-end system. Key idea behind MCL is the assumption of the existence of a tensor subspace which can capture the essential features from the signal for the downstream learning task. In this paper, we propose a novel solution to address both of the aforementioned requirements, i.e., How to find those tensor subspaces in which the signals of interest are highly separable?
arXiv Detail & Related papers (2020-02-17T19:06:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.