Representations and Strategies for Transferable Machine Learning Models
in Chemical Discovery
- URL: http://arxiv.org/abs/2106.10768v1
- Date: Sun, 20 Jun 2021 22:34:04 GMT
- Title: Representations and Strategies for Transferable Machine Learning Models
in Chemical Discovery
- Authors: Daniel R. Harper, Aditya Nandy, Naveen Arunachalam, Chenru Duan, Jon
Paul Janet, and Heather J. Kulik
- Abstract summary: We introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the periodic table with a small number of data points from the additional row.
We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance.
- Score: 0.7695660509846216
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Strategies for machine-learning(ML)-accelerated discovery that are general
across materials composition spaces are essential, but demonstrations of ML
have been primarily limited to narrow composition variations. By addressing the
scarcity of data in promising regions of chemical space for challenging targets
like open-shell transition-metal complexes, general representations and
transferable ML models that leverage known relationships in existing data will
accelerate discovery. Over a large set (ca. 1000) of isovalent transition-metal
complexes, we quantify evident relationships for different properties (i.e.,
spin-splitting and ligand dissociation) between rows of the periodic table
(i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to
graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that
incorporates the effective nuclear charge alongside the nuclear charge
heuristic that otherwise overestimates dissimilarity of isovalent complexes. To
address the common challenge of discovery in a new space where data is limited,
we introduce a transfer learning approach in which we seed models trained on a
large amount of data from one row of the periodic table with a small number of
data points from the additional row. We demonstrate the synergistic value of
the eRACs alongside this transfer learning strategy to consistently improve
model performance. Analysis of these models highlights how the approach
succeeds by reordering the distances between complexes to be more consistent
with the periodic table, a property we expect to be broadly useful for other
materials domains.
Related papers
- Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Enhanced sampling of robust molecular datasets with uncertainty-based
collective variables [0.0]
We propose a method that leverages uncertainty as the collective variable (CV) to guide the acquisition of chemically-relevant data points.
This approach employs a Gaussian Mixture Model-based uncertainty metric from a single model as the CV for biased molecular dynamics simulations.
arXiv Detail & Related papers (2024-02-06T06:42:51Z) - SC-MAD: Mixtures of Higher-order Networks for Data Augmentation [36.33265644447091]
The simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models.
We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms.
We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities.
arXiv Detail & Related papers (2023-09-14T06:25:39Z) - Learning Multiscale Consistency for Self-supervised Electron Microscopy
Instance Segmentation [48.267001230607306]
We propose a pretraining framework that enhances multiscale consistency in EM volumes.
Our approach leverages a Siamese network architecture, integrating strong and weak data augmentations.
It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
arXiv Detail & Related papers (2023-08-19T05:49:13Z) - DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained
Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states.
We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs.
Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z) - Low-cost machine learning approach to the prediction of transition metal
phosphor excited state properties [0.4306143768014156]
Photoactive iridium complexes are of broad interest due to their applications ranging from lighting to photocatalysis.
We leverage low-cost machine learning (ML) models to predict the excited state properties of iridium complexes.
arXiv Detail & Related papers (2022-09-18T16:24:07Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Deciphering Cryptic Behavior in Bimetallic Transition Metal Complexes
with Machine Learning [0.856335408411906]
We train a regression model on a subset of 330 structurally characterized heterobimetallics to predict the degree of metal-metal bonding.
Our work provides guidance for rational bimetallic design, suggesting that properties including the formal ratio should be transferable from one period to another.
arXiv Detail & Related papers (2021-07-29T19:01:56Z) - Materials Representation and Transfer Learning for Multi-Property
Prediction [22.068267502715404]
The adoption of machine learning in materials science has rapidly transformed materials property prediction.
Hurdles limiting full capitalization of recent advancements in machine learning include the limited development of methods to learn the underlying interactions of multiple elements.
We introduce the Hierarchical Correlation Learning for Multi-property Prediction framework that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning.
arXiv Detail & Related papers (2021-06-04T03:00:34Z) - Graph Neural Network for Hamiltonian-Based Material Property Prediction [56.94118357003096]
We present and compare several different graph convolution networks that are able to predict the band gap for inorganic materials.
The models are developed to incorporate two different features: the information of each orbital itself and the interaction between each other.
The results show that our model can get a promising prediction accuracy with cross-validation.
arXiv Detail & Related papers (2020-05-27T13:32:10Z) - Inverse Learning of Symmetries [71.62109774068064]
We learn the symmetry transformation with a model consisting of two latent subspaces.
Our approach is based on the deep information bottleneck in combination with a continuous mutual information regulariser.
Our model outperforms state-of-the-art methods on artificial and molecular datasets.
arXiv Detail & Related papers (2020-02-07T13:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.