Related papers: Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

URL: http://arxiv.org/abs/2410.12771v1
Date: Wed, 16 Oct 2024 17:48:34 GMT
Title: Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models
Authors: Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi,
Abstract summary: We present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard.
Score: 3.865029260331255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

Related papers

Artificial Intelligence and Generative Models for Materials Discovery -- A Review [0.0]
Review aims to discuss different principles of AI-driven generative models that are applicable for materials discovery.<n>We will also highlight specific applications of generative models in designing new catalysts, semiconductors, polymers, or crystals.
arXiv Detail & Related papers (2025-08-05T09:56:27Z)
Machine Learning-Based Prediction of Metal-Organic Framework Materials: A Comparative Analysis of Multiple Models [2.089191490381739]
Metal-organic frameworks (MOFs) have emerged as promising materials for various applications.<n>This study presents a comprehensive investigation of machine learning approaches for predicting MOF material properties.
arXiv Detail & Related papers (2025-07-06T18:10:00Z)
Materials Generation in the Era of Artificial Intelligence: A Comprehensive Survey [54.40267149907223]
Materials are the foundation of modern society, underpinning advancements in energy, electronics, healthcare, transportation, and infrastructure.<n>The ability to discover and design new materials with tailored properties is critical to solving some of the most pressing global challenges.<n>Data-driven generative models provide a powerful tool for materials design by directly create novel materials that satisfy predefined property requirements.
arXiv Detail & Related papers (2025-05-22T08:33:21Z)
MatterTune: An Integrated, User-Friendly Platform for Fine-Tuning Atomistic Foundation Models to Accelerate Materials Simulation and Discovery [7.1240120153291535]
We introduce MatterTune, a framework that provides advanced fine-tuning capabilities and seamless integration of atomistic foundation models into downstream materials informatics and simulation. MatterTune supports a number of state-of-the-art foundation models such as ORB, MatterSim, JMP, and EquformerV2.
arXiv Detail & Related papers (2025-04-14T19:12:43Z)
Multi-modal Data Fusion and Deep Ensemble Learning for Accurate Crop Yield Prediction [0.0]
This study introduces RicEns-Net, a novel Deep Ensemble model designed to predict crop yields. The research focuses on the use of synthetic aperture radar (SAR), optical remote sensing data from Sentinel 1, 2, and 3 satellites, and meteorological measurements such as surface temperature and rainfall. The primary objective is to enhance the precision of crop yield prediction by developing a machine-learning framework capable of handling complex environmental data.
arXiv Detail & Related papers (2025-02-09T22:48:27Z)
DARWIN 1.5: Large Language Models as Materials Science Adapted Learners [46.7259033847682]
We propose DARWIN 1.5, the largest open-source large language model tailored for materials science. DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
arXiv Detail & Related papers (2024-12-16T16:51:27Z)
Foundation Model for Composite Materials and Microstructural Analysis [49.1574468325115]
We present a foundation model specifically designed for composite materials. Our model is pre-trained on a dataset of short-fiber composites to learn robust latent features. During transfer learning, the MMAE accurately predicts homogenized stiffness, with an R2 score reaching as high as 0.959 and consistently exceeding 0.91, even when trained on limited data.
arXiv Detail & Related papers (2024-11-10T19:06:25Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields. We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation. Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities [59.02391344178202]
Vision foundation models (VFMs) serve as potent building blocks for a wide range of AI applications. The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions.
arXiv Detail & Related papers (2024-01-16T01:57:24Z)
Multimodal Learning for Materials [7.167520424757711]
We introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes.
arXiv Detail & Related papers (2023-11-30T18:35:29Z)
Scalable Diffusion for Materials Generation [99.71001883652211]
We develop a unified crystal representation that can represent any crystal structure (UniMat) UniMat can generate high fidelity crystal structures from larger and more complex chemical systems. We propose additional metrics for evaluating generative models of materials.
arXiv Detail & Related papers (2023-10-18T15:49:39Z)
Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT [9.33544942080883]
This article presents a new natural language processing (NLP) task called structured information inference (SII) to address the complexities of information extraction at the device level in materials science. We accomplished this task by tuning GPT-3 on an existing perovskite solar cell FAIR dataset with 91.8% F1-score and extended the dataset with data published since its release. We also designed experiments to predict the electrical performance of solar cells and design materials or devices with targeted parameters using large language models (LLMs)
arXiv Detail & Related papers (2023-04-05T04:01:52Z)
DA-VEGAN: Differentiably Augmenting VAE-GAN for microstructure reconstruction from extremely small data sets [110.60233593474796]
DA-VEGAN is a model with two central innovations. A $beta$-variational autoencoder is incorporated into a hybrid GAN architecture. A custom differentiable data augmentation scheme is developed specifically for this architecture.
arXiv Detail & Related papers (2023-02-17T08:49:09Z)
Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models. The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z)
Predicting Mechanically Driven Full-Field Quantities of Interest with Deep Learning-Based Metamodels [0.0]
We extend the Mechanical MNIST dataset to enable the investigation of full field QoI prediction. We establish strong baseline performance for predicting full-field QoI with MultiRes-WNet architecture.
arXiv Detail & Related papers (2021-07-24T00:43:49Z)
BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof. Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z)
Intelligent multiscale simulation based on process-guided composite database [0.0]
We present an integrated data-driven modeling framework based on process modeling, material homogenization, and machine learning. We are interested in the injection-molded short fiber reinforced composites, which have been identified as key material systems in automotive, aerospace, and electronics industries.
arXiv Detail & Related papers (2020-03-20T20:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.