Disentangling Variational Autoencoders
- URL: http://arxiv.org/abs/2211.07700v1
- Date: Mon, 14 Nov 2022 19:22:41 GMT
- Title: Disentangling Variational Autoencoders
- Authors: Rafael Pastrana
- Abstract summary: A variational autoencoder (VAE) projects an input set of high-dimensional data to a lower-dimensional, latent space.
We implement three different VAE models from the literature and train them on a dataset of 60,000 images of hand-written digits.
We investigate the trade-offs between the quality of the reconstruction of the decoded images and the level of disentanglement of the latent space.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A variational autoencoder (VAE) is a probabilistic machine learning framework
for posterior inference that projects an input set of high-dimensional data to
a lower-dimensional, latent space. The latent space learned with a VAE offers
exciting opportunities to develop new data-driven design processes in creative
disciplines, in particular, to automate the generation of multiple novel
designs that are aesthetically reminiscent of the input data but that were
unseen during training. However, the learned latent space is typically
disorganized and entangled: traversing the latent space along a single
dimension does not result in changes to single visual attributes of the data.
The lack of latent structure impedes designers from deliberately controlling
the visual attributes of new designs generated from the latent space. This
paper presents an experimental study that investigates latent space
disentanglement. We implement three different VAE models from the literature
and train them on a publicly available dataset of 60,000 images of hand-written
digits. We perform a sensitivity analysis to find a small number of latent
dimensions necessary to maximize a lower bound to the log marginal likelihood
of the data. Furthermore, we investigate the trade-offs between the quality of
the reconstruction of the decoded images and the level of disentanglement of
the latent space. We are able to automatically align three latent dimensions
with three interpretable visual properties of the digits: line weight, tilt and
width. Our experiments suggest that i) increasing the contribution of the
Kullback-Leibler divergence between the prior over the latents and the
variational distribution to the evidence lower bound, and ii) conditioning
input image class enhances the learning of a disentangled latent space with a
VAE.
Related papers
- Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object
Detection [55.210991151015534]
We present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection.
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
arXiv Detail & Related papers (2024-01-10T08:56:07Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - ProtoVAE: Prototypical Networks for Unsupervised Disentanglement [1.6114012813668934]
We introduce a novel deep generative VAE-based model, ProtoVAE, that leverages a deep metric learning Prototypical network trained using self-supervision.
Our model is completely unsupervised and requires no priori knowledge of the dataset, including the number of factors.
We evaluate our proposed model on the benchmark dSprites, 3DShapes, and MPI3D disentanglement datasets.
arXiv Detail & Related papers (2023-05-16T01:29:26Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational)
Autoencoder in Fourier Space [0.0]
Group-equivariant neural networks have emerged as a data-efficient approach to solve classification and regression tasks.
Here, we present Holographic-(Variational) Autoencoder in Fourier space, suitable for unsupervised learning and generation of data distributed around a specified origin in 3D.
We show that the learned latent space efficiently encodes the categorical features of spherical images.
arXiv Detail & Related papers (2022-09-30T16:25:20Z) - Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling.
They are troubled by two challenges: information underrepresentation and posterior collapse.
We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z) - Evidential Sparsification of Multimodal Latent Spaces in Conditional
Variational Autoencoders [63.46738617561255]
We consider the problem of sparsifying the discrete latent space of a trained conditional variational autoencoder.
We use evidential theory to identify the latent classes that receive direct evidence from a particular input condition and filter out those that do not.
Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of our proposed technique.
arXiv Detail & Related papers (2020-10-19T01:27:21Z) - IntroVAC: Introspective Variational Classifiers for Learning
Interpretable Latent Subspaces [6.574517227976925]
IntroVAC learns interpretable latent subspaces by exploiting information from an additional label.
We show that IntroVAC is able to learn meaningful directions in the latent space enabling fine manipulation of image attributes.
arXiv Detail & Related papers (2020-08-03T10:21:41Z) - PCAAE: Principal Component Analysis Autoencoder for organising the
latent space of generative networks [0.0]
We propose a novel autoencoder whose latent space verifies two properties.
The components of the latent space are statistically independent.
We show results on both synthetic examples of shapes and on a state-of-the-art GAN.
arXiv Detail & Related papers (2020-06-14T07:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.