Related papers: Disentangling Variational Autoencoders

Disentangling Variational Autoencoders

URL: http://arxiv.org/abs/2211.07700v1
Date: Mon, 14 Nov 2022 19:22:41 GMT
Title: Disentangling Variational Autoencoders
Authors: Rafael Pastrana
Abstract summary: A variational autoencoder (VAE) projects an input set of high-dimensional data to a lower-dimensional, latent space. We implement three different VAE models from the literature and train them on a dataset of 60,000 images of hand-written digits. We investigate the trade-offs between the quality of the reconstruction of the decoded images and the level of disentanglement of the latent space.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A variational autoencoder (VAE) is a probabilistic machine learning framework for posterior inference that projects an input set of high-dimensional data to a lower-dimensional, latent space. The latent space learned with a VAE offers exciting opportunities to develop new data-driven design processes in creative disciplines, in particular, to automate the generation of multiple novel designs that are aesthetically reminiscent of the input data but that were unseen during training. However, the learned latent space is typically disorganized and entangled: traversing the latent space along a single dimension does not result in changes to single visual attributes of the data. The lack of latent structure impedes designers from deliberately controlling the visual attributes of new designs generated from the latent space. This paper presents an experimental study that investigates latent space disentanglement. We implement three different VAE models from the literature and train them on a publicly available dataset of 60,000 images of hand-written digits. We perform a sensitivity analysis to find a small number of latent dimensions necessary to maximize a lower bound to the log marginal likelihood of the data. Furthermore, we investigate the trade-offs between the quality of the reconstruction of the decoded images and the level of disentanglement of the latent space. We are able to automatically align three latent dimensions with three interpretable visual properties of the digits: line weight, tilt and width. Our experiments suggest that i) increasing the contribution of the Kullback-Leibler divergence between the prior over the latents and the variational distribution to the evidence lower bound, and ii) conditioning input image class enhances the learning of a disentangled latent space with a VAE.

Related papers

Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces [52.237827968294766]
We show that naive post-training feature alignment of uni-modal text and 3D encoders results in limited performance. We then focus on extracting subspaces of the corresponding feature spaces and discover that by projecting learned representations onto well-chosen lower-dimensional subspaces the quality of alignment becomes significantly higher.
arXiv Detail & Related papers (2025-03-07T09:51:56Z)
Exploring the latent space of diffusion models directly through singular value decomposition [31.900933527692846]
We propose a novel image editing framework that is capable of learning arbitrary attributes from one pair of latent codes destined by text prompts in Diffusion Models. We will release our codes soon to foster further research and applications in this area.
arXiv Detail & Related papers (2025-02-04T11:04:36Z)
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC) Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. Comprehensive experiments underscore our framework's superior generalization capabilities. Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space [0.0]
Group-equivariant neural networks have emerged as a data-efficient approach to solve classification and regression tasks. Here, we present Holographic-(Variational) Autoencoder in Fourier space, suitable for unsupervised learning and generation of data distributed around a specified origin in 3D. We show that the learned latent space efficiently encodes the categorical features of spherical images.
arXiv Detail & Related papers (2022-09-30T16:25:20Z)
Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling. They are troubled by two challenges: information underrepresentation and posterior collapse. We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z)
Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders [63.46738617561255]
We consider the problem of sparsifying the discrete latent space of a trained conditional variational autoencoder. We use evidential theory to identify the latent classes that receive direct evidence from a particular input condition and filter out those that do not. Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of our proposed technique.
arXiv Detail & Related papers (2020-10-19T01:27:21Z)
IntroVAC: Introspective Variational Classifiers for Learning Interpretable Latent Subspaces [6.574517227976925]
IntroVAC learns interpretable latent subspaces by exploiting information from an additional label. We show that IntroVAC is able to learn meaningful directions in the latent space enabling fine manipulation of image attributes.
arXiv Detail & Related papers (2020-08-03T10:21:41Z)
PCAAE: Principal Component Analysis Autoencoder for organising the latent space of generative networks [0.0]
We propose a novel autoencoder whose latent space verifies two properties. The components of the latent space are statistically independent. We show results on both synthetic examples of shapes and on a state-of-the-art GAN.
arXiv Detail & Related papers (2020-06-14T07:40:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.