Characterizing and Avoiding Problematic Global Optima of Variational
Autoencoders
- URL: http://arxiv.org/abs/2003.07756v1
- Date: Tue, 17 Mar 2020 15:14:25 GMT
- Title: Characterizing and Avoiding Problematic Global Optima of Variational
Autoencoders
- Authors: Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez
- Abstract summary: Variational Auto-encoders (VAEs) are deep generative latent variable models.
Recent work shows that traditional training methods tend to yield solutions that violate desiderata.
We show that both issues stem from the fact that the global optima of the VAE training objective often correspond to undesirable solutions.
- Score: 28.36260646471421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational Auto-encoders (VAEs) are deep generative latent variable models
consisting of two components: a generative model that captures a data
distribution p(x) by transforming a distribution p(z) over latent space, and an
inference model that infers likely latent codes for each data point (Kingma and
Welling, 2013). Recent work shows that traditional training methods tend to
yield solutions that violate modeling desiderata: (1) the learned generative
model captures the observed data distribution but does so while ignoring the
latent codes, resulting in codes that do not represent the data (e.g. van den
Oord et al. (2017); Kim et al. (2018)); (2) the aggregate of the learned latent
codes does not match the prior p(z). This mismatch means that the learned
generative model will be unable to generate realistic data with samples from
p(z)(e.g. Makhzani et al. (2015); Tomczak and Welling (2017)). In this paper,
we demonstrate that both issues stem from the fact that the global optima of
the VAE training objective often correspond to undesirable solutions. Our
analysis builds on two observations: (1) the generative model is unidentifiable
- there exist many generative models that explain the data equally well, each
with different (and potentially unwanted) properties and (2) bias in the VAE
objective - the VAE objective may prefer generative models that explain the
data poorly but have posteriors that are easy to approximate. We present a
novel inference method, LiBI, mitigating the problems identified in our
analysis. On synthetic datasets, we show that LiBI can learn generative models
that capture the data distribution and inference models that better satisfy
modeling assumptions when traditional methods struggle to do so.
Related papers
- Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities.
We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula.
Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset.
We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z) - Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders [22.77397537980102]
We show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior.
We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines.
arXiv Detail & Related papers (2024-03-13T20:16:21Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion
Models [12.542073306638988]
We show that overfitting encoders in VAEs can be effectively mitigated by training on samples from a pre-trained diffusion model.
We analyze generalization performance, amortization gap, and robustness of VAEs trained with our proposed method on three different data sets.
arXiv Detail & Related papers (2023-10-30T15:38:39Z) - Gaussian Process Probes (GPP) for Uncertainty-Aware Probing [61.91898698128994]
We introduce a unified and simple framework for probing and measuring uncertainty about concepts represented by models.
Our experiments show it can (1) probe a model's representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.
arXiv Detail & Related papers (2023-05-29T17:00:16Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data [16.00692074660383]
VAEM is a deep generative model that is trained in a two stage manner.
We show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.
arXiv Detail & Related papers (2020-06-21T23:47:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.