Related papers: Identifiability of deep generative models under mixture priors without auxiliary information

Identifiability of deep generative models under mixture priors without auxiliary information

URL: http://arxiv.org/abs/2206.10044v1
Date: Mon, 20 Jun 2022 23:24:48 GMT
Title: Identifiability of deep generative models under mixture priors without auxiliary information
Authors: Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam
Abstract summary: We prove identifiability of a class of deep latent variable models with universal approximation capabilities. Our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space.
Score: 34.191553176662325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We prove identifiability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used in practice. Unlike existing work, our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space. Recently, there has been a surge of works studying identifiability of such models. In these works, the main assumption is that along with the data, an auxiliary variable $u$ (also known as side information) is observed as well. At the same time, several works have empirically observed that this doesn't seem to be necessary in practice. In this work, we explain this behavior by showing that for a broad class of generative (i.e. unsupervised) models with universal approximation capabilities, the side information $u$ is not necessary: We prove identifiability of the entire generative model where we do not observe $u$ and only observe the data $x$. The models we consider are tightly connected with autoencoder architectures used in practice that leverage mixture priors in the latent space and ReLU/leaky-ReLU activations in the encoder. Our main result is an identifiability hierarchy that significantly generalizes previous work and exposes how different assumptions lead to different "strengths" of identifiability. For example, our weakest result establishes (unsupervised) identifiability up to an affine transformation, which already improves existing work. It's well known that these models have universal approximation capabilities and moreover, they have been extensively used in practice to learn representations of data.

Related papers

Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training [8.824077990271503]
We investigate the role of the training dynamics in the transition from generalization to memorization.<n>We find that $tau_mathrmmem$ increases linearly with the training set size $n$, while $tau_mathrmgen$ remains constant.<n>It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times.
arXiv Detail & Related papers (2025-05-23T08:58:47Z)
Intriguing Properties of Robust Classification [19.858602457988194]
We show that in certain settings robust generalization is only possible with unrealistically large amounts of data. We explore how well robust classifiers generalize on datasets such as CIFAR-10.
arXiv Detail & Related papers (2024-12-05T15:27:39Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models [6.065846799248359]
Large language models (LLMs) have achieved remarkable proficiency on solving diverse problems. However, their generalization ability is not always satisfying and the generalization problem is common for generative transformer models in general. We show that when training models on n-digit operations, models generalize successfully on unseen n-digit inputs, but fail miserably on longer, unseen cases.
arXiv Detail & Related papers (2023-08-16T10:09:42Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Leveraging variational autoencoders for multiple data imputation [0.5156484100374059]
We investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations. To overcome this, we employ $beta$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification.
arXiv Detail & Related papers (2022-09-30T08:58:43Z)
Entropy optimized semi-supervised decomposed vector-quantized variational autoencoder model based on transfer learning for multiclass text classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation. The proposed model employs the concept of transfer learning for training a quantized transformer model. Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z)
Combining Diverse Feature Priors [90.74601233745047]
We show that models trained with diverse sets of feature priors have less overlapping failure modes. We also demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other's mistakes.
arXiv Detail & Related papers (2021-10-15T17:31:10Z)
Nonlinear Invariant Risk Minimization: A Causal Approach [5.63479133344366]
We propose a learning paradigm that enables out-of-distribution generalization in the nonlinear setting. We show identifiability of the data representation up to very simple transformations. Extensive experiments on both synthetic and real-world datasets show that our approach significantly outperforms a variety of baseline methods.
arXiv Detail & Related papers (2021-02-24T15:38:41Z)
Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. We propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.