Deep Kernel Methods Learn Better: From Cards to Process Optimization
- URL: http://arxiv.org/abs/2303.14554v2
- Date: Tue, 19 Sep 2023 13:53:34 GMT
- Title: Deep Kernel Methods Learn Better: From Cards to Process Optimization
- Authors: Mani Valleti, Rama K. Vasudevan, Maxim A. Ziatdinov, Sergei V. Kalinin
- Abstract summary: We show that DKL with active learning can produce a more compact and smooth latent space.
We demonstrate this behavior using a simple cards data set and extend it to the optimization of domain-generated trajectories in physical systems.
- Score: 0.7587345054583298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of deep learning methods to perform classification and regression
tasks relies heavily on their capacity to uncover manifolds in high-dimensional
data spaces and project them into low-dimensional representation spaces. In
this study, we investigate the structure and character of the manifolds
generated by classical variational autoencoder (VAE) approaches and deep kernel
learning (DKL). In the former case, the structure of the latent space is
determined by the properties of the input data alone, while in the latter, the
latent manifold forms as a result of an active learning process that balances
the data distribution and target functionalities. We show that DKL with active
learning can produce a more compact and smooth latent space which is more
conducive to optimization compared to previously reported methods, such as the
VAE. We demonstrate this behavior using a simple cards data set and extend it
to the optimization of domain-generated trajectories in physical systems. Our
findings suggest that latent manifolds constructed through active learning have
a more beneficial structure for optimization problems, especially in
feature-rich target-poor scenarios that are common in domain sciences, such as
materials synthesis, energy storage, and molecular discovery. The jupyter
notebooks that encapsulate the complete analysis accompany the article.
Related papers
- Pullback Flow Matching on Data Manifolds [10.187244125099479]
Pullback Flow Matching (PFM) is a framework for generative modeling on data manifold.
We demonstrate PFM's effectiveness through applications in synthetic, data dynamics and protein sequence data, generating novel proteins with specific properties.
This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.
arXiv Detail & Related papers (2024-10-06T16:41:26Z) - Understanding active learning of molecular docking and its applications [0.6554326244334868]
We investigate how active learning methodologies effectively predict docking scores using only 2D structures.
Our findings suggest that surrogate models tend to memorize structural patterns prevalent in high docking scored compounds.
Our comprehensive analysis underscores the reliability and potential applicability of active learning methodologies in virtual screening campaigns.
arXiv Detail & Related papers (2024-06-14T05:43:42Z) - Scalable manifold learning by uniform landmark sampling and constrained
locally linear embedding [0.6144680854063939]
We propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner.
We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types.
scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure.
arXiv Detail & Related papers (2024-01-02T08:43:06Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Optimization of a Hydrodynamic Computational Reservoir through Evolution [58.720142291102135]
We interface with a model of a hydrodynamic system, under development by a startup, as a computational reservoir.
We optimized the readout times and how inputs are mapped to the wave amplitude or frequency using an evolutionary search algorithm.
Applying evolutionary methods to this reservoir system substantially improved separability on an XNOR task, in comparison to implementations with hand-selected parameters.
arXiv Detail & Related papers (2023-04-20T19:15:02Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Joint Embedding Self-Supervised Learning in the Kernel Regime [21.80241600638596]
Self-supervised learning (SSL) produces useful representations of data without access to any labels for classifying the data.
We extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel.
We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
arXiv Detail & Related papers (2022-09-29T15:53:19Z) - Measuring dissimilarity with diffeomorphism invariance [94.02751799024684]
We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces.
We prove that DID enjoys properties which make it relevant for theoretical study and practical use.
arXiv Detail & Related papers (2022-02-11T13:51:30Z) - High-Dimensional Bayesian Optimisation with Variational Autoencoders and
Deep Metric Learning [119.91679702854499]
We introduce a method based on deep metric learning to perform Bayesian optimisation over high-dimensional, structured input spaces.
We achieve such an inductive bias using just 1% of the available labelled data.
As an empirical contribution, we present state-of-the-art results on real-world high-dimensional black-box optimisation problems.
arXiv Detail & Related papers (2021-06-07T13:35:47Z) - Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics [21.95240820041655]
Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions.
We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
arXiv Detail & Related papers (2020-10-18T13:33:02Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.