Machine learning with persistent homology and chemical word embeddings
improves prediction accuracy and interpretability in metal-organic frameworks
- URL: http://arxiv.org/abs/2010.00532v2
- Date: Wed, 31 Mar 2021 17:56:48 GMT
- Title: Machine learning with persistent homology and chemical word embeddings
improves prediction accuracy and interpretability in metal-organic frameworks
- Authors: Aditi S. Krishnapriyan, Joseph Montoya, Maciej Haranczyk, Jens
Hummelsh{\o}j, Dmitriy Morozov
- Abstract summary: We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material's structure and chemistry.
It automatically encapsulates geometric and chemical information directly from the material system.
Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features.
- Score: 0.07874708385247352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning has emerged as a powerful approach in materials discovery.
Its major challenge is selecting features that create interpretable
representations of materials, useful across multiple prediction tasks. We
introduce an end-to-end machine learning model that automatically generates
descriptors that capture a complex representation of a material's structure and
chemistry. This approach builds on computational topology techniques (namely,
persistent homology) and word embeddings from natural language processing. It
automatically encapsulates geometric and chemical information directly from the
material system. We demonstrate our approach on multiple nanoporous
metal-organic framework datasets by predicting methane and carbon dioxide
adsorption across different conditions. Our results show considerable
improvement in both accuracy and transferability across targets compared to
models constructed from the commonly-used, manually-curated features,
consistently achieving an average 25-30% decrease in
root-mean-squared-deviation and an average increase of 40-50% in R2 scores. A
key advantage of our approach is interpretability: Our model identifies the
pores that correlate best to adsorption at different pressures, which
contributes to understanding atomic-level structure--property relationships for
materials design.
Related papers
- Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [57.01994216693825]
Fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable.
We show that our strongest model can generate materials predicted to be metastable at about twice the rate of CDVAE.
Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material.
arXiv Detail & Related papers (2024-02-06T20:35:28Z) - On the importance of catalyst-adsorbate 3D interactions for relaxed
energy predictions [98.70797778496366]
We investigate whether it is possible to predict a system's relaxed energy in the OC20 dataset while ignoring the relative position of the adsorbate.
We find that while removing binding site information impairs accuracy as expected, modified models are able to predict relaxed energies with remarkably decent MAE.
arXiv Detail & Related papers (2023-10-10T14:57:04Z) - Structure to Property: Chemical Element Embeddings and a Deep Learning
Approach for Accurate Prediction of Chemical Properties [0.0]
This paper introduces a new machine learning model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks.
We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds.
The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning.
arXiv Detail & Related papers (2023-09-17T19:41:32Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Automatic Identification of Chemical Moieties [11.50343898633327]
We introduce a method to automatically identify chemical moieties from atomic representations using message-passing neural networks.
The versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases.
arXiv Detail & Related papers (2022-03-30T10:58:23Z) - A Machine Learning Method for Material Property Prediction: Example
Polymer Compatibility [39.364776649251944]
We present a brand-new and general machine learning method for material property prediction.
As a representative example, polymer compatibility is chosen to demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2022-02-28T05:48:05Z) - Improving VAE based molecular representations for compound property
prediction [0.0]
We propose a simple method to improve chemical property prediction performance of machine learning models.
We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
arXiv Detail & Related papers (2022-01-13T12:57:11Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Graph Neural Network for Hamiltonian-Based Material Property Prediction [56.94118357003096]
We present and compare several different graph convolution networks that are able to predict the band gap for inorganic materials.
The models are developed to incorporate two different features: the information of each orbital itself and the interaction between each other.
The results show that our model can get a promising prediction accuracy with cross-validation.
arXiv Detail & Related papers (2020-05-27T13:32:10Z) - Machine Learning Enabled Discovery of Application Dependent Design
Principles for Two-dimensional Materials [1.1470070927586016]
We train an ensemble of models to predict thermodynamic, mechanical, and electronic properties.
We carry out a screening of nearly 45,000 structures for two largely disjoint applications.
We find that hybrid organic-inorganic perovskites with lead and tin tend to be good candidates for solar cell applications.
arXiv Detail & Related papers (2020-03-19T23:13:50Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.