Structure-Property Maps with Kernel Principal Covariates Regression
- URL: http://arxiv.org/abs/2002.05076v2
- Date: Thu, 21 May 2020 15:58:36 GMT
- Title: Structure-Property Maps with Kernel Principal Covariates Regression
- Authors: Benjamin A. Helfrecht, Rose K. Cersonsky, Guillaume Fraux, and Michele
Ceriotti
- Abstract summary: We introduce a kernelized version of PCovR and a sparsified extension, and demonstrate the performance of this approach in revealing and predicting structure-property relations.
We show examples of elemental carbon, porous silicate frameworks, organic molecules, amino acid conformers, and molecular materials.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data analyses based on linear methods constitute the simplest, most robust,
and transparent approaches to the automatic processing of large amounts of data
for building supervised or unsupervised machine learning models. Principal
covariates regression (PCovR) is an underappreciated method that interpolates
between principal component analysis and linear regression, and can be used to
conveniently reveal structure-property relations in terms of
simple-to-interpret, low-dimensional maps. Here we provide a pedagogic overview
of these data analysis schemes, including the use of the kernel trick to
introduce an element of non-linearity, while maintaining most of the
convenience and the simplicity of linear approaches. We then introduce a
kernelized version of PCovR and a sparsified extension, and demonstrate the
performance of this approach in revealing and predicting structure-property
relations in chemistry and materials science, showing a variety of examples
including elemental carbon, porous silicate frameworks, organic molecules,
amino acid conformers, and molecular materials.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Graph Structure Inference with BAM: Introducing the Bilinear Attention
Mechanism [31.99564199048314]
We propose a novel neural network model for supervised graph structure learning.
The model is trained with variably shaped and coupled input data.
Our method demonstrates robust generalizability across both linear and various types of non-linear dependencies.
arXiv Detail & Related papers (2024-02-12T15:48:58Z) - Compositional Representation of Polymorphic Crystalline Materials [56.80318252233511]
We introduce PCRL, a novel approach that employs probabilistic modeling of composition to capture the diverse polymorphs from available structural information.
Extensive evaluations on sixteen datasets demonstrate the effectiveness of PCRL in learning compositional representation.
arXiv Detail & Related papers (2023-11-17T20:34:28Z) - Accelerated structured matrix factorization [0.0]
Matrix factorization exploits the idea that, in complex high-dimensional data, the actual signal typically lies in lower-dimensional structures.
By exploiting Bayesian shrinkage priors, we devise a computationally convenient approach for high-dimensional matrix factorization.
The dependence between row and column entities is modeled by inducing flexible sparse patterns within factors.
arXiv Detail & Related papers (2022-12-13T11:35:01Z) - Towards a mathematical understanding of learning from few examples with
nonlinear feature maps [68.8204255655161]
We consider the problem of data classification where the training set consists of just a few data points.
We reveal key relationships between the geometry of an AI model's feature space, the structure of the underlying data distributions, and the model's generalisation capabilities.
arXiv Detail & Related papers (2022-11-07T14:52:58Z) - Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet
Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets.
We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations.
We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z) - BenchML: an extensible pipelining framework for benchmarking
representations of materials and molecules at scale [0.0]
We introduce a machine-learning framework for benchmarking representations of chemical systems against datasets of materials and molecules.
The guiding principle is to evaluate raw descriptor performance by limiting model complexity to simple regression schemes.
The resulting models are intended as baselines that can inform future method development.
arXiv Detail & Related papers (2021-12-04T09:07:16Z) - Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension
reduction & clustering [9.042239247913642]
This article focuses on improving upon PCA and k-means, by allowing nonlinear relations in the data and more flexible cluster shapes.
The key contribution is a new framework for Principal Analysis (PEA), defining a simple and computationally efficient alternative to PCA.
In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.
arXiv Detail & Related papers (2020-08-17T06:25:50Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.