Related papers: Materials Transformers Language Models for Generative Materials Design: a benchmark study

Materials Transformers Language Models for Generative Materials Design: a benchmark study

URL: http://arxiv.org/abs/2206.13578v1
Date: Mon, 27 Jun 2022 18:50:05 GMT
Title: Materials Transformers Language Models for Generative Materials Design: a benchmark study
Authors: Nihang Fu, Lai Wei, Yuqi Song, Qinyang Li, Rui Xin, Sadman Sadeed Omee, Rongzhi Dong, Edirisuriya M. Dilanga Siriwardane, Jianjun Hu
Abstract summary: We train seven modern transformer language models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) using the expanded formulas from material deposited in the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or balanced electronegativity samples are used to benchmark the performances. Experiments showed that the causal language models based materials transformers can generate chemically valid materials compositions with as high as 97.54% to be charge neutral and 91.40% to be electronegativity balanced.
Score: 4.047301375093173
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Pre-trained transformer language models on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns of inorganic materials. Here we train a series of seven modern transformer language models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) using the expanded formulas from material deposited in the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or balanced electronegativity samples are used to benchmark the performances and uncover the generation biases of modern transformer models for the generative design of materials compositions. Our extensive experiments showed that the causal language models based materials transformers can generate chemically valid materials compositions with as high as 97.54\% to be charge neutral and 91.40\% to be electronegativity balanced, which has more than 6 times higher enrichment compared to a baseline pseudo-random sampling algorithm. These models also demonstrate high novelty and their potential in new materials discovery has been proved by their capability to recover the leave-out materials. We also find that the properties of the generated samples can be tailored by training the models with selected training sets such as high-bandgap materials. Our experiments also showed that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformer models to discover a set of new materials as validated using DFT calculations.

Related papers

Tensor Completion for Surrogate Modeling of Material Property Prediction [0.5735035463793009]
We model the optimization of certain material properties as a tensor completion problem. We leverage the structure of our datasets and navigate the vast number of combinations of material configurations. Across a variety of material property prediction tasks, our experiments show tensor completion methods achieving 10-20% decreased error.
arXiv Detail & Related papers (2025-01-30T04:59:21Z)
Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers' We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z)
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [57.01994216693825]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable. We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z)
Scalable Diffusion for Materials Generation [99.71001883652211]
We develop a unified crystal representation that can represent any crystal structure (UniMat) UniMat can generate high fidelity crystal structures from larger and more complex chemical systems. We propose additional metrics for evaluating generative models of materials.
arXiv Detail & Related papers (2023-10-18T15:49:39Z)
SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation [75.14793516745374]
We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data. Our experiments show that our method imparts the desired inductive bias, resulting in better few-shot learning for FST-like tasks.
arXiv Detail & Related papers (2023-10-01T21:19:12Z)
Evaluating the diversity and utility of materials proposed by generative models [38.85523285991743]
We show how one state-of-the-art generative model, the physics-guided crystal generation model, can be used as part of the inverse design process. Our findings suggest how generative models might be improved to enable better inverse design.
arXiv Detail & Related papers (2023-08-09T14:42:08Z)
Crystal Transformer: Self-learning neural language model for Generative and Tinkering Design of Materials [4.813020904720316]
BLMM Crystal Transformer is a neural network based probabilistic generative model for generative and tinkering design of inorganic materials. It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity. A user-friendly web app has been developed for computational materials doping and can be accessed freely at urlwww.materialsatlas.org/blmtinker.
arXiv Detail & Related papers (2022-04-25T20:20:26Z)
Semi-supervised teacher-student deep neural network for materials discovery [6.333015476935593]
We propose a semi-supervised deep neural network (TSDNN) model for high-performance formation energy and synthesizability prediction. For formation energy based stability screening, our model achieves an absolute 10.3% accuracy improvement compared to the baseline CGCNN regression model. For synthesizability prediction, our model significantly increases the baseline PU learning's true positive rate from 87.9% to 97.9% using 1/49 model parameters.
arXiv Detail & Related papers (2021-12-12T04:00:21Z)
How to See Hidden Patterns in Metamaterials with Interpretable Machine Learning [82.67551367327634]
We develop a new interpretable, multi-resolution machine learning framework for finding patterns in the unit-cells of materials. Specifically, we propose two new interpretable representations of metamaterials, called shape-frequency features and unit-cell templates.
arXiv Detail & Related papers (2021-11-10T21:19:02Z)
Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learning [56.67751936864119]
The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach. The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
arXiv Detail & Related papers (2021-10-18T14:43:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.