Related papers: Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation

Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation

URL: http://arxiv.org/abs/2507.06613v1
Date: Wed, 09 Jul 2025 07:29:41 GMT
Title: Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation
Authors: Anshuk Uppal, Yuhta Takida, Chieh-Hsin Lai, Yuki Mitsufuji,
Abstract summary: We propose a novel generative modeling framework that leverages a range of $beta$ values to learn multiple corresponding latent representations.<n>We introduce a non-linear diffusion model that smoothly transitions latent representations corresponding to different $beta$ values.<n>We evaluate our framework in terms of both disentanglement and generation quality.
Score: 15.957980475573365
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Disentangled and interpretable latent representations in generative models typically come at the cost of generation quality. The $\beta$-VAE framework introduces a hyperparameter $\beta$ to balance disentanglement and reconstruction quality, where setting $\beta > 1$ introduces an information bottleneck that favors disentanglement over sharp, accurate reconstructions. To address this trade-off, we propose a novel generative modeling framework that leverages a range of $\beta$ values to learn multiple corresponding latent representations. First, we obtain a slew of representations by training a single variational autoencoder (VAE), with a new loss function that controls the information retained in each latent representation such that the higher $\beta$ value prioritize disentanglement over reconstruction fidelity. We then, introduce a non-linear diffusion model that smoothly transitions latent representations corresponding to different $\beta$ values. This model denoises towards less disentangled and more informative representations, ultimately leading to (almost) lossless representations, enabling sharp reconstructions. Furthermore, our model supports sample generation without input images, functioning as a standalone generative model. We evaluate our framework in terms of both disentanglement and generation quality. Additionally, we observe smooth transitions in the latent spaces with respect to changes in $\beta$, facilitating consistent manipulation of generated outputs.

Related papers

Representation Entanglement for Generation:Training Diffusion Transformers Is Much Easier Than You Think [56.539823627694304]
REPA and its variants effectively mitigate training challenges in diffusion models by incorporating external visual representations from pretrained models.<n>We argue that the external alignment, which is absent during the entire denoising inference process, falls short of fully harnessing the potential of discriminative representations.<n>We propose Representation Entanglement for Generation (REG), which entangles low-level image latents with a single high-level class token from pretrained foundation models for denoising.
arXiv Detail & Related papers (2025-07-02T08:29:18Z)
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens [80.75893450536577]
We propose D2C, a novel two-stage method to enhance model generation capacity.<n>In the first stage, the discrete-valued tokens representing coarse-grained image features are sampled by employing a small discrete-valued generator.<n>In the second stage, the continuous-valued tokens representing fine-grained image features are learned conditioned on the discrete token sequence.
arXiv Detail & Related papers (2025-03-21T13:58:49Z)
$α$-TCVAE: On the relationship between Disentanglement and Diversity [21.811889512977924]
In this work, we introduce $alpha$-TCVAE, a variational autoencoder optimized using a novel total correlation (TC) lower bound. We present quantitative analyses that support the idea that disentangled representations lead to better generative capabilities and diversity. Our results demonstrate that $alpha$-TCVAE consistently learns more disentangled representations than baselines and generates more diverse observations.
arXiv Detail & Related papers (2024-11-01T13:50:06Z)
How to train your VAE [0.0]
Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term, and employs a PatchGAN discriminator to enhance texture realism.
arXiv Detail & Related papers (2023-09-22T19:52:28Z)
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [70.46245698746874]
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results. For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details.
arXiv Detail & Related papers (2023-08-29T07:11:52Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
The Transitive Information Theory and its Application to Deep Generative Models [0.0]
Variational Autoencoder (VAE) could be pushed in two opposite directions. Existing methods narrow the issues to the rate-distortion trade-off between compression and reconstruction. We develop a system that learns a hierarchy of disentangled representation together with a mechanism for recombining the learned representation for generalization.
arXiv Detail & Related papers (2022-03-09T22:35:02Z)
Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
On the Difference Between the Information Bottleneck and the Deep Information Bottleneck [81.89141311906552]
We revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. We show how to circumvent this limitation by optimising a lower bound for $I(T;Y)$ for which only the latter Markov chain has to be satisfied.
arXiv Detail & Related papers (2019-12-31T18:31:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.