Deep Kernel Methods Learn Better: From Cards to Process Optimization
- URL: http://arxiv.org/abs/2303.14554v2
- Date: Tue, 19 Sep 2023 13:53:34 GMT
- Title: Deep Kernel Methods Learn Better: From Cards to Process Optimization
- Authors: Mani Valleti, Rama K. Vasudevan, Maxim A. Ziatdinov, Sergei V. Kalinin
- Abstract summary: We show that DKL with active learning can produce a more compact and smooth latent space.
We demonstrate this behavior using a simple cards data set and extend it to the optimization of domain-generated trajectories in physical systems.
- Score: 0.7587345054583298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of deep learning methods to perform classification and regression
tasks relies heavily on their capacity to uncover manifolds in high-dimensional
data spaces and project them into low-dimensional representation spaces. In
this study, we investigate the structure and character of the manifolds
generated by classical variational autoencoder (VAE) approaches and deep kernel
learning (DKL). In the former case, the structure of the latent space is
determined by the properties of the input data alone, while in the latter, the
latent manifold forms as a result of an active learning process that balances
the data distribution and target functionalities. We show that DKL with active
learning can produce a more compact and smooth latent space which is more
conducive to optimization compared to previously reported methods, such as the
VAE. We demonstrate this behavior using a simple cards data set and extend it
to the optimization of domain-generated trajectories in physical systems. Our
findings suggest that latent manifolds constructed through active learning have
a more beneficial structure for optimization problems, especially in
feature-rich target-poor scenarios that are common in domain sciences, such as
materials synthesis, energy storage, and molecular discovery. The jupyter
notebooks that encapsulate the complete analysis accompany the article.
Related papers
- Into the Void: Mapping the Unseen Gaps in High Dimensional Data [23.226089369715016]
We present a comprehensive pipeline, augmented by a visual analytics system named GapMiner''
It is aimed at exploring and exploiting untapped opportunities within the empty areas of high-dimensional datasets.
arXiv Detail & Related papers (2025-01-25T16:57:21Z) - Equation discovery framework EPDE: Towards a better equation discovery [50.79602839359522]
We enhance the EPDE algorithm -- an evolutionary optimization-based discovery framework.
Our approach generates terms using fundamental building blocks such as elementary functions and individual differentials.
We validate our algorithm's noise resilience and overall performance by comparing its results with those from the state-of-the-art equation discovery framework SINDy.
arXiv Detail & Related papers (2024-12-28T15:58:44Z) - Pullback Flow Matching on Data Manifolds [10.187244125099479]
Pullback Flow Matching (PFM) is a framework for generative modeling on data manifold.
We demonstrate PFM's effectiveness through applications in synthetic, data dynamics and protein sequence data, generating novel proteins with specific properties.
This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.
arXiv Detail & Related papers (2024-10-06T16:41:26Z) - Understanding active learning of molecular docking and its applications [0.6554326244334868]
We investigate how active learning methodologies effectively predict docking scores using only 2D structures.
Our findings suggest that surrogate models tend to memorize structural patterns prevalent in high docking scored compounds.
Our comprehensive analysis underscores the reliability and potential applicability of active learning methodologies in virtual screening campaigns.
arXiv Detail & Related papers (2024-06-14T05:43:42Z) - Scalable manifold learning by uniform landmark sampling and constrained
locally linear embedding [0.6144680854063939]
We propose a scalable manifold learning (scML) method that can manipulate large-scale and high-dimensional data in an efficient manner.
We empirically validated the effectiveness of scML on synthetic datasets and real-world benchmarks of different types.
scML scales well with increasing data sizes and embedding dimensions, and exhibits promising performance in preserving the global structure.
arXiv Detail & Related papers (2024-01-02T08:43:06Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Optimization of a Hydrodynamic Computational Reservoir through Evolution [58.720142291102135]
We interface with a model of a hydrodynamic system, under development by a startup, as a computational reservoir.
We optimized the readout times and how inputs are mapped to the wave amplitude or frequency using an evolutionary search algorithm.
Applying evolutionary methods to this reservoir system substantially improved separability on an XNOR task, in comparison to implementations with hand-selected parameters.
arXiv Detail & Related papers (2023-04-20T19:15:02Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - High-Dimensional Bayesian Optimisation with Variational Autoencoders and
Deep Metric Learning [119.91679702854499]
We introduce a method based on deep metric learning to perform Bayesian optimisation over high-dimensional, structured input spaces.
We achieve such an inductive bias using just 1% of the available labelled data.
As an empirical contribution, we present state-of-the-art results on real-world high-dimensional black-box optimisation problems.
arXiv Detail & Related papers (2021-06-07T13:35:47Z) - Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics [21.95240820041655]
Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions.
We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
arXiv Detail & Related papers (2020-10-18T13:33:02Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.