Related papers: Characterizing and Avoiding Problematic Global Optima of Variational Autoencoders

Characterizing and Avoiding Problematic Global Optima of Variational Autoencoders

URL: http://arxiv.org/abs/2003.07756v1
Date: Tue, 17 Mar 2020 15:14:25 GMT
Title: Characterizing and Avoiding Problematic Global Optima of Variational Autoencoders
Authors: Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez
Abstract summary: Variational Auto-encoders (VAEs) are deep generative latent variable models. Recent work shows that traditional training methods tend to yield solutions that violate desiderata. We show that both issues stem from the fact that the global optima of the VAE training objective often correspond to undesirable solutions.
Score: 28.36260646471421
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Variational Auto-encoders (VAEs) are deep generative latent variable models consisting of two components: a generative model that captures a data distribution p(x) by transforming a distribution p(z) over latent space, and an inference model that infers likely latent codes for each data point (Kingma and Welling, 2013). Recent work shows that traditional training methods tend to yield solutions that violate modeling desiderata: (1) the learned generative model captures the observed data distribution but does so while ignoring the latent codes, resulting in codes that do not represent the data (e.g. van den Oord et al. (2017); Kim et al. (2018)); (2) the aggregate of the learned latent codes does not match the prior p(z). This mismatch means that the learned generative model will be unable to generate realistic data with samples from p(z)(e.g. Makhzani et al. (2015); Tomczak and Welling (2017)). In this paper, we demonstrate that both issues stem from the fact that the global optima of the VAE training objective often correspond to undesirable solutions. Our analysis builds on two observations: (1) the generative model is unidentifiable - there exist many generative models that explain the data equally well, each with different (and potentially unwanted) properties and (2) bias in the VAE objective - the VAE objective may prefer generative models that explain the data poorly but have posteriors that are easy to approximate. We present a novel inference method, LiBI, mitigating the problems identified in our analysis. On synthetic datasets, we show that LiBI can learn generative models that capture the data distribution and inference models that better satisfy modeling assumptions when traditional methods struggle to do so.

Related papers

Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities. We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula. Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Heat Death of Generative Models in Closed-Loop Learning [63.83608300361159]
We study the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset. We show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to degenerate.
arXiv Detail & Related papers (2024-04-02T21:51:39Z)
Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders [22.77397537980102]
We show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior. We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines.
arXiv Detail & Related papers (2024-03-13T20:16:21Z)
Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset. We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z)
Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data. It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs. Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z)
Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion Models [12.542073306638988]
We show that overfitting encoders in VAEs can be effectively mitigated by training on samples from a pre-trained diffusion model. We analyze generalization performance, amortization gap, and robustness of VAEs trained with our proposed method on three different data sets.
arXiv Detail & Related papers (2023-10-30T15:38:39Z)
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing [61.91898698128994]
We introduce a unified and simple framework for probing and measuring uncertainty about concepts represented by models. Our experiments show it can (1) probe a model's representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.
arXiv Detail & Related papers (2023-05-29T17:00:16Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data [16.00692074660383]
VAEM is a deep generative model that is trained in a two stage manner. We show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.
arXiv Detail & Related papers (2020-06-21T23:47:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.