Related papers: L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation

L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation

URL: http://arxiv.org/abs/2507.02619v1
Date: Thu, 03 Jul 2025 13:45:42 GMT
Title: L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation
Authors: Hazal Mogultay Ozcan, Sinan Kalkan, Fatos T. Yarman-Vural,
Abstract summary: L-VAE mitigates the limitations of beta-VAE by learning the relative weights of the terms in the loss function.<n>L-VAE consistently provides the best or the second best performances measured by a set of disentanglement metrics.
Score: 9.117340902796647
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a novel model called Learnable VAE (L-VAE), which learns a disentangled representation together with the hyperparameters of the cost function. L-VAE can be considered as an extension of \b{eta}-VAE, wherein the hyperparameter, \b{eta}, is empirically adjusted. L-VAE mitigates the limitations of \b{eta}-VAE by learning the relative weights of the terms in the loss function to control the dynamic trade-off between disentanglement and reconstruction losses. In the proposed model, the weight of the loss terms and the parameters of the model architecture are learned concurrently. An additional regularization term is added to the loss function to prevent bias towards either reconstruction or disentanglement losses. Experimental analyses show that the proposed L-VAE finds an effective balance between reconstruction fidelity and disentangling the latent dimensions. Comparisons of the proposed L-VAE against \b{eta}-VAE, VAE, ControlVAE, DynamicVAE, and {\sigma}-VAE on datasets, such as dSprites, MPI3D-complex, Falcor3D, and Isaac3D reveals that L-VAE consistently provides the best or the second best performances measured by a set of disentanglement metrics. Moreover, qualitative experiments on CelebA dataset, confirm the success of the L-VAE model for disentangling the facial attributes.

Related papers

$α$-TCVAE: On the relationship between Disentanglement and Diversity [21.811889512977924]
In this work, we introduce $alpha$-TCVAE, a variational autoencoder optimized using a novel total correlation (TC) lower bound. We present quantitative analyses that support the idea that disentangled representations lead to better generative capabilities and diversity. Our results demonstrate that $alpha$-TCVAE consistently learns more disentangled representations than baselines and generates more diverse observations.
arXiv Detail & Related papers (2024-11-01T13:50:06Z)
Matching aggregate posteriors in the variational autoencoder [0.5759862457142761]
The variational autoencoder (VAE) is a well-studied, deep, latent-variable model (DLVM) This paper addresses shortcomings in VAEs by reformulating the objective function associated with VAEs in order to match the aggregate/marginal posterior distribution to the prior. The proposed method is named the emphaggregate variational autoencoder (AVAE) and is built on the theoretical framework of the VAE.
arXiv Detail & Related papers (2023-11-13T19:22:37Z)
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance. Current methods with a fixed model do not work uniformly well across various datasets. This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z)
A Model for Multi-View Residual Covariances based on Perspective Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z)
The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss. Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU. The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z)
Auto-Weighted Layer Representation Based View Synthesis Distortion Estimation for 3-D Video Coding [78.53837757673597]
In this paper, an auto-weighted layer representation based view synthesis distortion estimation model is developed. The proposed method outperforms the relevant state-of-the-art methods in both accuracy and efficiency.
arXiv Detail & Related papers (2022-01-07T12:12:41Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
ControlVAE: Tuning, Analytical Properties, and Performance Analysis [14.272917020105147]
ControlVAE is a new variational autoencoder framework. It stabilizes the KL-divergence of VAE models to a specified value. It can achieve a good trade-off between reconstruction quality and KL-divergence.
arXiv Detail & Related papers (2020-10-31T12:32:39Z)
DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning [15.317044259237043]
This paper challenges the common assumption that the weight $beta$, in $beta$-VAE, should be larger than $1$ in order to effectively disentangle latent factors. We demonstrate that $beta$-VAE, with $beta 1$, can not only attain good disentanglement but also significantly improve reconstruction accuracy via dynamic control.
arXiv Detail & Related papers (2020-09-15T00:01:11Z)
q-VAE for Disentangled Representation Learning and Latent Dynamical Systems [8.071506311915396]
A variational autoencoder (VAE) derived from Tsallis statistics called q-VAE is proposed. In the proposed method, a standard VAE is employed to statistically extract latent space hidden in sampled data.
arXiv Detail & Related papers (2020-03-04T01:38:39Z)
Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.