Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
- URL: http://arxiv.org/abs/2212.03905v2
- Date: Wed, 16 Aug 2023 19:43:20 GMT
- Title: Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
- Authors: Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa,
Jimmy Ba, Roger Grosse
- Abstract summary: Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.
In this paper, we introduce Multi-Rate VAE, a computationally efficient framework for learning optimal parameters corresponding to various $beta$ in a single training run.
- Score: 29.86440019821837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational autoencoders (VAEs) are powerful tools for learning latent
representations of data used in a wide range of applications. In practice, VAEs
usually require multiple training rounds to choose the amount of information
the latent variable should retain. This trade-off between the reconstruction
error (distortion) and the KL divergence (rate) is typically parameterized by a
hyperparameter $\beta$. In this paper, we introduce Multi-Rate VAE (MR-VAE), a
computationally efficient framework for learning optimal parameters
corresponding to various $\beta$ in a single training run. The key idea is to
explicitly formulate a response function that maps $\beta$ to the optimal
parameters using hypernetworks. MR-VAEs construct a compact response
hypernetwork where the pre-activations are conditionally gated based on
$\beta$. We justify the proposed architecture by analyzing linear VAEs and
showing that it can represent response functions exactly for linear VAEs. With
the learned hypernetwork, MR-VAEs can construct the rate-distortion curve
without additional training and can be deployed with significantly less
hyperparameter tuning. Empirically, our approach is competitive and often
exceeds the performance of multiple $\beta$-VAEs training with minimal
computation and memory overheads.
Related papers
- Theoretical Convergence Guarantees for Variational Autoencoders [2.8167997311962942]
Variational Autoencoders (VAE) are popular generative models used to sample from complex data distributions.
This paper aims to bridge that gap by providing non-asymptotic convergence guarantees for VAE trained using both Gradient Descent and Adam algorithms.
Our theoretical analysis applies to both Linear VAE and Deep Gaussian VAE, as well as several VAE variants, including $beta$-VAE and IWAE.
arXiv Detail & Related papers (2024-10-22T07:12:38Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Machine-Learning Compression for Particle Physics Discoveries [4.432585853295899]
In collider-based particle and nuclear physics experiments, data are produced at such extreme rates that only a subset can be recorded for later analysis.
We propose a strategy that bridges these paradigms by compressing entire events for generic offline analysis but at a lower fidelity.
arXiv Detail & Related papers (2022-10-20T18:00:04Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Scalable One-Pass Optimisation of High-Dimensional Weight-Update
Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser.
It requires only one training episode, with no restarts.
We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z) - [Reproducibility Report] Rigging the Lottery: Making All Tickets Winners [1.6884611234933766]
$textitRigL$, a sparse training algorithm, claims to directly train sparse networks that match or exceed the performance of existing dense-to-sparse training techniques.
We implement $textitRigL$ from scratch in Pytorch and reproduce its performance on CIFAR-10 within 0.1% of the reported value.
arXiv Detail & Related papers (2021-03-29T17:01:11Z) - Simple and Effective VAE Training with Calibrated Decoders [123.08908889310258]
Variational autoencoders (VAEs) provide an effective and simple method for modeling complex distributions.
We study the impact of calibrated decoders, which learn the uncertainty of the decoding distribution.
We propose a simple but novel modification to the commonly used Gaussian decoder, which computes the prediction variance analytically.
arXiv Detail & Related papers (2020-06-23T17:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.