Machine learning with persistent homology and chemical word embeddings
improves prediction accuracy and interpretability in metal-organic frameworks
- URL: http://arxiv.org/abs/2010.00532v2
- Date: Wed, 31 Mar 2021 17:56:48 GMT
- Title: Machine learning with persistent homology and chemical word embeddings
improves prediction accuracy and interpretability in metal-organic frameworks
- Authors: Aditi S. Krishnapriyan, Joseph Montoya, Maciej Haranczyk, Jens
Hummelsh{\o}j, Dmitriy Morozov
- Abstract summary: We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material's structure and chemistry.
It automatically encapsulates geometric and chemical information directly from the material system.
Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features.
- Score: 0.07874708385247352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning has emerged as a powerful approach in materials discovery.
Its major challenge is selecting features that create interpretable
representations of materials, useful across multiple prediction tasks. We
introduce an end-to-end machine learning model that automatically generates
descriptors that capture a complex representation of a material's structure and
chemistry. This approach builds on computational topology techniques (namely,
persistent homology) and word embeddings from natural language processing. It
automatically encapsulates geometric and chemical information directly from the
material system. We demonstrate our approach on multiple nanoporous
metal-organic framework datasets by predicting methane and carbon dioxide
adsorption across different conditions. Our results show considerable
improvement in both accuracy and transferability across targets compared to
models constructed from the commonly-used, manually-curated features,
consistently achieving an average 25-30% decrease in
root-mean-squared-deviation and an average increase of 40-50% in R2 scores. A
key advantage of our approach is interpretability: Our model identifies the
pores that correlate best to adsorption at different pressures, which
contributes to understanding atomic-level structure--property relationships for
materials design.
Related papers
- Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder [2.563209727695243]
Inverse materials design has proven successful in accelerating novel material discovery.
Many inverse materials design methods use unsupervised learning where a latent space is learned to offer a compact description of materials representations.
Here, we present a semi-supervised learning approach based on a disentangled variational autoencoder to learn a probabilistic relationship between features, latent variables and target properties.
arXiv Detail & Related papers (2024-09-10T02:21:13Z) - Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network [0.9736758288065405]
Mutagenicity is a concern due to its association with genetic mutations which can result in a variety of negative consequences.
In this work, we introduce a novel stacked ensemble based mutagenicity prediction model.
arXiv Detail & Related papers (2024-09-03T09:14:21Z) - A Large Encoder-Decoder Family of Foundation Models For Chemical Language [1.1073864511426255]
This paper introduces a large encoder-decoder chemical foundation models pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem.
Our experiments across multiple benchmark datasets validate the capacity of the proposed model in providing state-of-the-art results for different tasks.
arXiv Detail & Related papers (2024-07-24T20:30:39Z) - Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [57.01994216693825]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.
We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.
Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - On the importance of catalyst-adsorbate 3D interactions for relaxed
energy predictions [98.70797778496366]
We investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate.
We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE.
arXiv Detail & Related papers (2023-10-10T14:57:04Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - A Machine Learning Method for Material Property Prediction: Example
Polymer Compatibility [39.364776649251944]
We present a brand-new and general machine learning method for material property prediction.
As a representative example, polymer compatibility is chosen to demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2022-02-28T05:48:05Z) - Improving VAE based molecular representations for compound property
prediction [0.0]
We propose a simple method to improve chemical property prediction performance of machine learning models.
We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
arXiv Detail & Related papers (2022-01-13T12:57:11Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Graph Neural Network for Hamiltonian-Based Material Property Prediction [56.94118357003096]
We present and compare several different graph convolution networks that are able to predict the band gap for inorganic materials.
The models are developed to incorporate two different features: the information of each orbital itself and the interaction between each other.
The results show that our model can get a promising prediction accuracy with cross-validation.
arXiv Detail & Related papers (2020-05-27T13:32:10Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.