Related papers: Optimizing Training Trajectories in Variational Autoencoders via Latent Bayesian Optimization Approach

Optimizing Training Trajectories in Variational Autoencoders via Latent Bayesian Optimization Approach

URL: http://arxiv.org/abs/2207.00128v1
Date: Thu, 30 Jun 2022 23:41:47 GMT
Title: Optimizing Training Trajectories in Variational Autoencoders via Latent Bayesian Optimization Approach
Authors: Arpan Biswas, Rama Vasudevan, Maxim Ziatdinov, Sergei V. Kalinin
Abstract summary: Unsupervised and semi-supervised ML methods have become widely adopted across multiple areas of physics, chemistry, and materials sciences. We propose a latent Bayesian optimization (zBO) approach for the hyper parameter trajectory optimization for the unsupervised and semi-supervised ML. We demonstrate an application of this method for finding joint discrete and continuous rotationally invariant representations for MNIST and experimental data of a plasmonic nanoparticles material system.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unsupervised and semi-supervised ML methods such as variational autoencoders (VAE) have become widely adopted across multiple areas of physics, chemistry, and materials sciences due to their capability in disentangling representations and ability to find latent manifolds for classification and regression of complex experimental data. Like other ML problems, VAEs require hyperparameter tuning, e.g., balancing the Kullback Leibler (KL) and reconstruction terms. However, the training process and resulting manifold topology and connectivity depend not only on hyperparameters, but also their evolution during training. Because of the inefficiency of exhaustive search in a high-dimensional hyperparameter space for the expensive to train models, here we explored a latent Bayesian optimization (zBO) approach for the hyperparameter trajectory optimization for the unsupervised and semi-supervised ML and demonstrate for joint-VAE with rotational invariances. We demonstrate an application of this method for finding joint discrete and continuous rotationally invariant representations for MNIST and experimental data of a plasmonic nanoparticles material system. The performance of the proposed approach has been discussed extensively, where it allows for any high dimensional hyperparameter tuning or trajectory optimization of other ML models.

Related papers

Transferring linearly fixed QAOA angles: performance and real device results [0.0]
We investigate a simplified approach that combines linear parameterization with parameter transferring, reducing the parameter space to just 4 dimensions regardless of the number of layers. We compare this combined approach with standard QAOA and other parameter setting strategies such as INTERP and FOURIER, which require computationally demanding incremental layer-by-layer optimization. Our experiments extend from classical simulation to actual quantum hardware implementation on IBM's Eagle processor, demonstrating the approach's viability on current NISQ devices.
arXiv Detail & Related papers (2025-04-17T04:17:51Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z)
Fully Bayesian Differential Gaussian Processes through Stochastic Differential Equations [7.439555720106548]
We propose a fully Bayesian approach that treats the kernel hyper parameters as random variables and constructs coupled differential equations (SDEs) to learn their posterior distribution and that of inducing points. Our approach provides a time-varying, comprehensive, and realistic posterior approximation through coupling variables using SDE methods. Our work opens up exciting research avenues for advancing Bayesian inference and offers a powerful modeling tool for continuous-time Gaussian processes.
arXiv Detail & Related papers (2024-08-12T11:41:07Z)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z)
Active-Learning-Driven Surrogate Modeling for Efficient Simulation of Parametric Nonlinear Systems [0.0]
In absence of governing equations, we need to construct the parametric reduced-order surrogate model in a non-intrusive fashion. Our work provides a non-intrusive optimality criterion to efficiently populate the parameter snapshots. We propose an active-learning-driven surrogate model using kernel-based shallow neural networks.
arXiv Detail & Related papers (2023-06-09T18:01:14Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers. We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z)
Spectral Tensor Train Parameterization of Deep Learning Layers [136.4761580842396]
We study low-rank parameterizations of weight matrices with embedded spectral properties in the Deep Learning context. We show the effects of neural network compression in the classification setting and both compression and improved stability training in the generative adversarial training setting.
arXiv Detail & Related papers (2021-03-07T00:15:44Z)
Bayesian multiscale deep generative model for the solution of high-dimensional inverse problems [0.0]
A novel multiscale Bayesian inference approach is introduced based on deep probabilistic generative models. The method allows high-dimensional parameter estimation while exhibiting stability, efficiency and accuracy.
arXiv Detail & Related papers (2021-02-04T11:47:21Z)
VisEvol: Visual Analytics to Support Hyperparameter Search through Evolutionary Optimization [4.237343083490243]
During the training phase of machine learning (ML) models, it is usually necessary to configure several hyper parameters. We present VisEvol, a visual analytics tool that supports interactive exploration of hyper parameters and intervention in this evolutionary procedure. The utility and applicability of VisEvol are demonstrated with two use cases and interviews with ML experts who evaluated the effectiveness of the tool.
arXiv Detail & Related papers (2020-12-02T13:43:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.