DARWIN 1.5: Large Language Models as Materials Science Adapted Learners
- URL: http://arxiv.org/abs/2412.11970v2
- Date: Thu, 23 Jan 2025 08:07:41 GMT
- Title: DARWIN 1.5: Large Language Models as Materials Science Adapted Learners
- Authors: Tong Xie, Yuwei Wan, Yixuan Liu, Yuchen Zeng, Shaozhou Wang, Wenjie Zhang, Clara Grazian, Chunyu Kit, Wanli Ouyang, Dongzhan Zhou, Bram Hoex,
- Abstract summary: We propose DARWIN 1.5, the largest open-source large language model tailored for materials science.
DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery.
Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer.
- Score: 46.7259033847682
- License:
- Abstract: Materials discovery and design aim to find compositions and structures with desirable properties over highly complex and diverse physical spaces. Traditional solutions, such as high-throughput simulations or machine learning, often rely on complex descriptors, which hinder generalizability and transferability across different material systems. Moreover, These descriptors may inadequately represent macro-scale material properties, which are influenced by structural imperfections and compositional variations in real-world samples, thus limiting their practical applicability. To address these challenges, we propose DARWIN 1.5, the largest open-source large language model tailored for materials science. By leveraging natural language as input, DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer. The enhanced model achieves up to 59.1% improvement in prediction accuracy over the base LLaMA-7B architecture and outperforms SOTA machine learning approaches across 8 materials design tasks. These results establish LLMs as a promising foundation for developing versatile and scalable models in materials science.
Related papers
- Foundational Large Language Models for Materials Research [22.77591279242839]
Large Language Models (LLMs) offer opportunities to accelerate materials research through automated analysis and prediction.
Here, we present LLaMat, a family of foundational models for materials science developed through continued pretraining of LLaMA models.
We demonstrate that LLaMat excels in materials-specific NLP and structured information extraction while maintaining general linguistic capabilities.
arXiv Detail & Related papers (2024-12-12T18:46:38Z) - MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.
Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.
We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - Foundation Model for Composite Materials and Microstructural Analysis [0.0]
We present a foundation model specifically designed for composite materials.
Our findings validate the feasibility and effectiveness of foundation models in composite materials.
This framework enables high-accuracy predictions even when experimental data are scarce.
arXiv Detail & Related papers (2024-11-10T19:06:25Z) - OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction [54.706361479680055]
We introduce the OpenMaterial dataset, comprising 1001 objects made of 295 distinct materials.
OpenMaterial provides comprehensive annotations, including 3D shape, material type, camera pose, depth, and object mask.
It stands as the first large-scale dataset enabling quantitative evaluations of existing algorithms on objects with diverse and challenging materials.
arXiv Detail & Related papers (2024-06-13T07:46:17Z) - Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [57.01994216693825]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.
We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.
Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - FAENet: Frame Averaging Equivariant GNN for Materials Modeling [123.19473575281357]
We introduce a flexible framework relying on frameaveraging (SFA) to make any model E(3)-equivariant or invariant through data transformations.
We prove the validity of our method theoretically and empirically demonstrate its superior accuracy and computational scalability in materials modeling.
arXiv Detail & Related papers (2023-04-28T21:48:31Z) - A Comprehensive and Versatile Multimodal Deep Learning Approach for
Predicting Diverse Properties of Advanced Materials [0.9517427900627922]
We present a multimodal deep learning framework for predicting physical properties of a 10-dimensional acrylic polymer composite material.
Our approach handles an 18-dimensional complexity, with 10 compositional inputs and 8 property outputs, successfully predicting 913,680 property data points across 114,210 composition conditions.
This study advances future research on different materials and the development of more sophisticated models, drawing us closer to the ultimate goal of predicting all properties of all materials.
arXiv Detail & Related papers (2023-03-29T02:42:17Z) - DA-VEGAN: Differentiably Augmenting VAE-GAN for microstructure
reconstruction from extremely small data sets [110.60233593474796]
DA-VEGAN is a model with two central innovations.
A $beta$-variational autoencoder is incorporated into a hybrid GAN architecture.
A custom differentiable data augmentation scheme is developed specifically for this architecture.
arXiv Detail & Related papers (2023-02-17T08:49:09Z) - Data-driven multi-scale modeling and robust optimization of composite
structure with uncertainty quantification [0.42581756453559755]
This chapter demonstrates advanced data-driven methods and outlines the capability that must be developed/added for the multi-scale modeling of advanced composite materials.
It proposes a multi-scale modeling method for composite structures based on a finite element method (FEM) simulation driven by surrogate models/emulators.
arXiv Detail & Related papers (2022-10-13T16:40:11Z) - Intelligent multiscale simulation based on process-guided composite
database [0.0]
We present an integrated data-driven modeling framework based on process modeling, material homogenization, and machine learning.
We are interested in the injection-molded short fiber reinforced composites, which have been identified as key material systems in automotive, aerospace, and electronics industries.
arXiv Detail & Related papers (2020-03-20T20:39:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.