Learning Disentangled Discrete Representations
        - URL: http://arxiv.org/abs/2307.14151v1
- Date: Wed, 26 Jul 2023 12:29:58 GMT
- Title: Learning Disentangled Discrete Representations
- Authors: David Friede, Christian Reimers, Heiner Stuckenschmidt and Mathias
  Niepert
- Abstract summary: We show the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder with a tailored categorical variational autoencoder.
We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations.
- Score: 22.5004558029479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Recent successes in image generation, model-based reinforcement learning, and
text-to-image generation have demonstrated the empirical advantages of discrete
latent representations, although the reasons behind their benefits remain
unclear. We explore the relationship between discrete latent spaces and
disentangled representations by replacing the standard Gaussian variational
autoencoder (VAE) with a tailored categorical variational autoencoder. We show
that the underlying grid structure of categorical distributions mitigates the
problem of rotational invariance associated with multivariate Gaussian
distributions, acting as an efficient inductive prior for disentangled
representations. We provide both analytical and empirical findings that
demonstrate the advantages of discrete VAEs for learning disentangled
representations. Furthermore, we introduce the first unsupervised model
selection strategy that favors disentangled representations.
 
      
        Related papers
        - G4Seg: Generation for Inexact Segmentation Refinement with Diffusion   Models [38.44872934965588]
 This paper considers the problem of utilizing a large-scale text-to-image model to tackle the Inexact diffusion (IS) task.<n>We exploit the pattern discrepancies between original images and mask-conditional generated images to facilitate a coarse-to-fine segmentation refinement.
 arXiv  Detail & Related papers  (2025-06-02T11:05:28Z)
- Disentanglement with Factor Quantized Variational Autoencoders [11.086500036180222]
 We propose a discrete variational autoencoder (VAE) based model where the ground truth information about the generative factors are not provided to the model.
We demonstrate the advantages of learning discrete representations over learning continuous representations in facilitating disentanglement.
Our method called FactorQVAE is the first method that combines optimization based disentanglement approaches with discrete representation learning.
 arXiv  Detail & Related papers  (2024-09-23T09:33:53Z)
- The Benefits of Balance: From Information Projections to Variance   Reduction [7.082773426322819]
 We show that an iterative algorithm, usually used to avoid representation collapse, enjoys an unsuspected benefit.
We provide non-asymptotic bounds quantifying this variance reduction effect and relate them to the eigendecays of appropriately defined Markov operators.
We explain how various forms of data balancing in contrastive multimodal learning and self-supervised clustering can be interpreted as instances of this variance reduction scheme.
 arXiv  Detail & Related papers  (2024-08-27T13:48:15Z)
- Identifiable Latent Neural Causal Models [82.14087963690561]
 Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
 arXiv  Detail & Related papers  (2024-03-23T04:13:55Z)
- Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
  Mixture Models [59.331993845831946]
 Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
 arXiv  Detail & Related papers  (2024-03-03T23:15:48Z)
- Revealing Multimodal Contrastive Representation Learning through Latent
  Partial Causal Models [85.67870425656368]
 We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
 arXiv  Detail & Related papers  (2024-02-09T07:18:06Z)
- Interpreting Equivariant Representations [5.325297567945828]
 In this paper, we demonstrate that the inductive bias imposed on the by an equivariant model must also be taken into account when using latent representations.
We show how not accounting for the inductive biases leads to decreased performance on downstream tasks, and vice versa.
 arXiv  Detail & Related papers  (2024-01-23T09:43:30Z)
- C$^2$VAE: Gaussian Copula-based VAE Differing Disentangled from Coupled
  Representations with Contrastive Posterior [36.2531431458649]
 We present a self-supervised variational autoencoder (VAE) to jointly learn disentangled and dependent hidden factors.
We then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations in a contrastive manner.
 arXiv  Detail & Related papers  (2023-09-23T08:33:48Z)
- Supervised Contrastive Learning with Heterogeneous Similarity for
  Distribution Shifts [3.7819322027528113]
 We propose a new regularization using the supervised contrastive learning to prevent such overfitting and to train models that do not degrade their performance under the distribution shifts.
 Experiments on benchmark datasets that emulate distribution shifts, including subpopulation shift and domain generalization, demonstrate the advantage of the proposed method.
 arXiv  Detail & Related papers  (2023-04-07T01:45:09Z)
- Modelling nonlinear dependencies in the latent space of inverse
  scattering [1.5990720051907859]
 In inverse scattering proposed by Angles and Mallat, a deep neural network is trained to invert the scattering transform applied to an image.
After such a network is trained, it can be used as a generative model given that we can sample from the distribution of principal components of scattering coefficients.
Within this paper, two such models are explored, namely a Variational AutoEncoder and a Generative Adversarial Network.
 arXiv  Detail & Related papers  (2022-03-19T12:07:43Z)
- Regularizing Variational Autoencoder with Diversity and Uncertainty
  Awareness [61.827054365139645]
 Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
 arXiv  Detail & Related papers  (2021-10-24T07:58:13Z)
- Improving the Reconstruction of Disentangled Representation Learners via   Multi-Stage Modeling [54.94763543386523]
 Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
 arXiv  Detail & Related papers  (2020-10-25T18:51:15Z)
- Learning Disentangled Representations with Latent Variation
  Predictability [102.4163768995288]
 This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
 arXiv  Detail & Related papers  (2020-07-25T08:54:26Z)
- When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
 Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
 arXiv  Detail & Related papers  (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.