IntroVAC: Introspective Variational Classifiers for Learning
Interpretable Latent Subspaces
- URL: http://arxiv.org/abs/2008.00760v2
- Date: Mon, 14 Sep 2020 07:23:10 GMT
- Title: IntroVAC: Introspective Variational Classifiers for Learning
Interpretable Latent Subspaces
- Authors: Marco Maggipinto and Matteo Terzi and Gian Antonio Susto
- Abstract summary: IntroVAC learns interpretable latent subspaces by exploiting information from an additional label.
We show that IntroVAC is able to learn meaningful directions in the latent space enabling fine manipulation of image attributes.
- Score: 6.574517227976925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning useful representations of complex data has been the subject of
extensive research for many years. With the diffusion of Deep Neural Networks,
Variational Autoencoders have gained lots of attention since they provide an
explicit model of the data distribution based on an encoder/decoder
architecture which is able to both generate images and encode them in a
low-dimensional subspace. However, the latent space is not easily interpretable
and the generation capabilities show some limitations since images typically
look blurry and lack details. In this paper, we propose the Introspective
Variational Classifier (IntroVAC), a model that learns interpretable latent
subspaces by exploiting information from an additional label and provides
improved image quality thanks to an adversarial training strategy.We show that
IntroVAC is able to learn meaningful directions in the latent space enabling
fine-grained manipulation of image attributes. We validate our approach on the
CelebA dataset.
Related papers
- Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - InfoDiffusion: Representation Learning Using Information Maximizing
Diffusion Models [35.566528358691336]
InfoDiffusion is an algorithm that augments diffusion models with low-dimensional latent variables.
InfoDiffusion relies on a learning objective regularized with the mutual information between observed and hidden variables.
We find that InfoDiffusion learns disentangled and human-interpretable latent representations that are competitive with state-of-the-art generative and contrastive methods.
arXiv Detail & Related papers (2023-06-14T21:48:38Z) - Disentangling Variational Autoencoders [0.0]
A variational autoencoder (VAE) projects an input set of high-dimensional data to a lower-dimensional, latent space.
We implement three different VAE models from the literature and train them on a dataset of 60,000 images of hand-written digits.
We investigate the trade-offs between the quality of the reconstruction of the decoded images and the level of disentanglement of the latent space.
arXiv Detail & Related papers (2022-11-14T19:22:41Z) - Toward a Geometrical Understanding of Self-supervised Contrastive
Learning [55.83778629498769]
Self-supervised learning (SSL) is one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations.
Mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector.
In this paper, we investigate how the strength of the data augmentation policies affects the data embedding.
arXiv Detail & Related papers (2022-05-13T23:24:48Z) - Unsupervised Representation Learning from Pathology Images with
Multi-directional Contrastive Predictive Coding [0.33148826359547523]
We present a modification to the CPC framework for use with digital pathology patches.
This is achieved by introducing an alternative mask for building the latent context.
We show that our proposed modification can yield improved deep classification of histology patches.
arXiv Detail & Related papers (2021-05-11T21:17:13Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.