Related papers: Generative Language Model for Catalyst Discovery

Generative Language Model for Catalyst Discovery

URL: http://arxiv.org/abs/2407.14040v1
Date: Fri, 19 Jul 2024 05:34:08 GMT
Title: Generative Language Model for Catalyst Discovery
Authors: Dong Hyeon Mok, Seoin Back,
Abstract summary: We introduce the Catalyst Generative Pretrained Transformer (CatGPT), trained to generate string representations of inorganic catalyst structures from a vast chemical space. CatGPT not only demonstrates high performance in generating valid and accurate catalyst structures but also serves as a foundation model for generating desired types of catalysts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discovery of novel and promising materials is a critical challenge in the field of chemistry and material science, traditionally approached through methodologies ranging from trial-and-error to machine learning-driven inverse design. Recent studies suggest that transformer-based language models can be utilized as material generative models to expand chemical space and explore materials with desired properties. In this work, we introduce the Catalyst Generative Pretrained Transformer (CatGPT), trained to generate string representations of inorganic catalyst structures from a vast chemical space. CatGPT not only demonstrates high performance in generating valid and accurate catalyst structures but also serves as a foundation model for generating desired types of catalysts by fine-tuning with sparse and specified datasets. As an example, we fine-tuned the pretrained CatGPT using a binary alloy catalyst dataset designed for screening two-electron oxygen reduction reaction (2e-ORR) catalyst and generate catalyst structures specialized for 2e-ORR. Our work demonstrates the potential of language models as generative tools for catalyst discovery.

Related papers

Transition States Energies from Machine Learning: An Application to Reverse Water-Gas Shift on Single-Atom Alloys [0.0]
We propose a machine learning (ML) model for predicting transition state (TS) energies based on Gaussian process regression. Applying the model to predict TS energies for the reverse water-gas shift (RWGS) reaction on single-atom alloy catalysts, we show it can significantly improve the accuracy.
arXiv Detail & Related papers (2025-05-01T15:01:02Z)
Inorganic Catalyst Efficiency Prediction Based on EAPCR Model: A Deep Learning Solution for Multi-Source Heterogeneous Data [9.022023762759641]
This study introduces the Embedding-Attention-Permutated CNN-Residual (EAPCR) deep learning model. EAPCR constructs a feature association matrix using embedding and attention mechanisms and enhances predictive performance. We evaluate EAPCR on datasets from heterogeneous photocatalysis, thermal, and electrocatalysis.
arXiv Detail & Related papers (2025-03-10T15:10:22Z)
BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction. Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions. This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z)
A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery [10.92613600218535]
We introduce a robust machine learning and explainable AI (XAI) framework to accurately classify the catalytic yield of various compositions. This framework combines a series of ML practices designed to handle the scarcity and imbalance of catalyst data. We believe that such insights can assist chemists in the development and identification of novel catalysts with superior performance.
arXiv Detail & Related papers (2024-07-10T13:09:53Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment [51.49238426241974]
This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules.
arXiv Detail & Related papers (2024-03-25T03:23:03Z)
Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task. We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z)
Turning hazardous volatile matter compounds into fuel by catalytic steam reforming: An evolutionary machine learning approach [2.1026063307327045]
This study is the first to develop a machine-learning-based research framework for modeling, understanding, and optimizing the catalytic steam reforming of volatile matter compounds. Toluene catalytic steam reforming is used as a case study to show how chemical/textural analyses can be used to obtain input features for machine learning models.
arXiv Detail & Related papers (2023-07-25T16:29:07Z)
Catalysis distillation neural network for the few shot open catalyst challenge [1.1878820609988694]
This paper introduces Few-Shot Open Catalyst Challenge 2023, a competition aimed at advancing the application of machine learning for predicting reactions. We propose a machine learning approach based on a framework called Catalysis Distillation Graph Neural Network (CDGNN) Our results demonstrate that CDGNN effectively learns embeddings from catalytic structures, enabling the capture of structure-adsorption relationships.
arXiv Detail & Related papers (2023-05-31T04:23:56Z)
PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design [102.9593507372373]
Catalyst materials play a crucial role in the electrochemical reactions involved in industrial processes. Machine learning holds the potential to efficiently model materials properties from large amounts of data. We propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy.
arXiv Detail & Related papers (2022-11-22T05:24:30Z)
Multi-Task Mixture Density Graph Neural Networks for Predicting Cu-based Single-Atom Alloy Catalysts for CO2 Reduction Reaction [61.9212585617803]
Graph neural networks (GNNs) have drawn more and more attention from material scientists. We develop a multi-task (MT) architecture based on DimeNet++ and mixture density networks to improve the performance of such task.
arXiv Detail & Related papers (2022-09-15T13:52:15Z)
Boosting Heterogeneous Catalyst Discovery by Structurally Constrained Deep Learning Models [0.0]
Deep learning approaches such as graph neural networks (GNNs) open new opportunity to significantly extend scope for modelling novel high-performance catalysts. Here we present embedding improvement for GNN that has been modified by Voronoi tesselation. We show that a sensible choice of data can decrease the error to values above physically-based 20 meV per atom threshold.
arXiv Detail & Related papers (2022-07-11T17:01:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.