Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE
Distillation and Diffusion Probabilistic Feedback
- URL: http://arxiv.org/abs/2402.02346v1
- Date: Sun, 4 Feb 2024 05:03:22 GMT
- Title: Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE
Distillation and Diffusion Probabilistic Feedback
- Authors: Xin Jin, Bohan Li, BAAO Xie, Wenyao Zhang, Jinming Liu, Ziqiang Li,
Tao Yang, Wenjun Zeng
- Abstract summary: Representation disentanglement may help AI fundamentally understand the real world and thus benefit both discrimination and generation tasks.
We propose a textbfCL-Disentanglement approach dubbed textbfCL-Dis.
Experiments demonstrate the superiority of CL-Dis on applications like real image manipulation and visual analysis.
- Score: 45.68054456449699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representation disentanglement may help AI fundamentally understand the real
world and thus benefit both discrimination and generation tasks. It currently
has at least three unresolved core issues: (i) heavy reliance on label
annotation and synthetic data -- causing poor generalization on natural
scenarios; (ii) heuristic/hand-craft disentangling constraints make it hard to
adaptively achieve an optimal training trade-off; (iii) lacking reasonable
evaluation metric, especially for the real label-free data. To address these
challenges, we propose a \textbf{C}losed-\textbf{L}oop unsupervised
representation \textbf{Dis}entanglement approach dubbed \textbf{CL-Dis}.
Specifically, we use diffusion-based autoencoder (Diff-AE) as a backbone while
resorting to $\beta$-VAE as a co-pilot to extract semantically disentangled
representations. The strong generation ability of diffusion model and the good
disentanglement ability of VAE model are complementary. To strengthen
disentangling, VAE-latent distillation and diffusion-wise feedback are
interconnected in a closed-loop system for a further mutual promotion. Then, a
self-supervised \textbf{Navigation} strategy is introduced to identify
interpretable semantic directions in the disentangled latent space. Finally, a
new metric based on content tracking is designed to evaluate the
disentanglement effect. Experiments demonstrate the superiority of CL-Dis on
applications like real image manipulation and visual analysis.
Related papers
- $α$-TCVAE: On the relationship between Disentanglement and Diversity [21.811889512977924]
In this work, we introduce $alpha$-TCVAE, a variational autoencoder optimized using a novel total correlation (TC) lower bound.
We present quantitative analyses that support the idea that disentangled representations lead to better generative capabilities and diversity.
Our results demonstrate that $alpha$-TCVAE consistently learns more disentangled representations than baselines and generates more diverse observations.
arXiv Detail & Related papers (2024-11-01T13:50:06Z) - Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models [3.1923251959845214]
Disentangled representation learning (DRL) aims to break down observed data into core intrinsic factors for a profound understanding of the data.
Recently, there have been limited explorations of utilizing diffusion models (DMs) for unsupervised DRL.
We propose Dynamic Gaussian Anchoring to enforce attribute-separated latent units for more interpretable DRL.
We also propose Skip Dropout technique, which easily modifies the denoising U-Net to be more DRL-friendly.
arXiv Detail & Related papers (2024-10-31T11:05:09Z) - Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution.
We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory.
That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - PDE+: Enhancing Generalization via PDE with Adaptive Distributional
Diffusion [66.95761172711073]
generalization of neural networks is a central challenge in machine learning.
We propose to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data.
We put this theoretical framework into practice as $textbfPDE+$ ($textbfPDE$ with $textbfA$daptive $textbfD$istributional $textbfD$iffusion)
arXiv Detail & Related papers (2023-05-25T08:23:26Z) - Uncertain Facial Expression Recognition via Multi-task Assisted
Correction [43.02119884581332]
We propose a novel method of multi-task assisted correction in addressing uncertain facial expression recognition called MTAC.
Specifically, a confidence estimation block and a weighted regularization module are applied to highlight solid samples and suppress uncertain samples in every batch.
Experiments on RAF-DB, AffectNet, and AffWild2 datasets demonstrate that the MTAC obtains substantial improvements over baselines when facing synthetic and real uncertainties.
arXiv Detail & Related papers (2022-12-14T10:28:08Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.