Generalization Gap in Amortized Inference
- URL: http://arxiv.org/abs/2205.11640v1
- Date: Mon, 23 May 2022 21:28:47 GMT
- Title: Generalization Gap in Amortized Inference
- Authors: Mingtian Zhang and Peter Hayes and David Barber
- Abstract summary: We study the generalizations of a popular class of probabilistic models - the Variational Auto-Encoder (VAE)
We show that the over-fitting phenomenon is usually dominated by the amortized inference network.
We propose a new training objective, inspired by the classic wake-sleep algorithm, to improve the generalizations properties of amortized inference.
- Score: 17.951010274427187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability of likelihood-based probabilistic models to generalize to unseen
data is central to many machine learning applications such as lossless
compression. In this work, we study the generalizations of a popular class of
probabilistic models - the Variational Auto-Encoder (VAE). We point out the two
generalization gaps that can affect the generalization ability of VAEs and show
that the over-fitting phenomenon is usually dominated by the amortized
inference network. Based on this observation we propose a new training
objective, inspired by the classic wake-sleep algorithm, to improve the
generalizations properties of amortized inference. We also demonstrate how it
can improve generalization performance in the context of image modeling and
lossless compression.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - When Neural Networks Fail to Generalize? A Model Sensitivity Perspective [82.36758565781153]
Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions.
This paper considers a more realistic yet more challenging scenario, namely Single Domain Generalization (Single-DG)
We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity"
We propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies.
arXiv Detail & Related papers (2022-12-01T20:15:15Z) - Revisiting the Compositional Generalization Abilities of Neural Sequence
Models [23.665350744415004]
We focus on one-shot primitive generalization as introduced by the popular SCAN benchmark.
We demonstrate that modifying the training distribution in simple and intuitive ways enables standard seq-to-seq models to achieve near-perfect generalization performance.
arXiv Detail & Related papers (2022-03-14T18:03:21Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - More Is More -- Narrowing the Generalization Gap by Adding
Classification Heads [8.883733362171032]
We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet'
Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
arXiv Detail & Related papers (2021-02-09T16:30:33Z) - Robustness to Augmentations as a Generalization metric [0.0]
Generalization is the ability of a model to predict on unseen domains.
We propose a method to predict the generalization performance of a model by using the concept that models that are robust to augmentations are more generalizable than those which are not.
The proposed method was the first runner up solution for the NeurIPS competition on Predicting Generalization in Deep Learning.
arXiv Detail & Related papers (2021-01-16T15:36:38Z) - Generalization and Memorization: The Bias Potential Model [9.975163460952045]
generative models and density estimators behave quite differently from models for learning functions.
For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted.
In the long term, the model either memorizes the samples or diverges.
arXiv Detail & Related papers (2020-11-29T04:04:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.