Identifiability of deep generative models under mixture priors without
auxiliary information
- URL: http://arxiv.org/abs/2206.10044v1
- Date: Mon, 20 Jun 2022 23:24:48 GMT
- Title: Identifiability of deep generative models under mixture priors without
auxiliary information
- Authors: Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam
- Abstract summary: We prove identifiability of a class of deep latent variable models with universal approximation capabilities.
Our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space.
- Score: 34.191553176662325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove identifiability of a broad class of deep latent variable models that
(a) have universal approximation capabilities and (b) are the decoders of
variational autoencoders that are commonly used in practice. Unlike existing
work, our analysis does not require weak supervision, auxiliary information, or
conditioning in the latent space. Recently, there has been a surge of works
studying identifiability of such models. In these works, the main assumption is
that along with the data, an auxiliary variable $u$ (also known as side
information) is observed as well. At the same time, several works have
empirically observed that this doesn't seem to be necessary in practice. In
this work, we explain this behavior by showing that for a broad class of
generative (i.e. unsupervised) models with universal approximation
capabilities, the side information $u$ is not necessary: We prove
identifiability of the entire generative model where we do not observe $u$ and
only observe the data $x$. The models we consider are tightly connected with
autoencoder architectures used in practice that leverage mixture priors in the
latent space and ReLU/leaky-ReLU activations in the encoder. Our main result is
an identifiability hierarchy that significantly generalizes previous work and
exposes how different assumptions lead to different "strengths" of
identifiability. For example, our weakest result establishes (unsupervised)
identifiability up to an affine transformation, which already improves existing
work. It's well known that these models have universal approximation
capabilities and moreover, they have been extensively used in practice to learn
representations of data.
Related papers
- Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models [6.065846799248359]
Large language models (LLMs) have achieved remarkable proficiency on solving diverse problems.
However, their generalization ability is not always satisfying and the generalization problem is common for generative transformer models in general.
We show that when training models on n-digit operations, models generalize successfully on unseen n-digit inputs, but fail miserably on longer, unseen cases.
arXiv Detail & Related papers (2023-08-16T10:09:42Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Leveraging variational autoencoders for multiple data imputation [0.5156484100374059]
We investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies.
We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations.
To overcome this, we employ $beta$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification.
arXiv Detail & Related papers (2022-09-30T08:58:43Z) - Entropy optimized semi-supervised decomposed vector-quantized
variational autoencoder model based on transfer learning for multiclass text
classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation.
The proposed model employs the concept of transfer learning for training a quantized transformer model.
Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z) - Combining Diverse Feature Priors [90.74601233745047]
We show that models trained with diverse sets of feature priors have less overlapping failure modes.
We also demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other's mistakes.
arXiv Detail & Related papers (2021-10-15T17:31:10Z) - Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting.
We show identifiability of the data representation up to very simple transformations.
Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.