Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE
Distillation and Diffusion Probabilistic Feedback
- URL: http://arxiv.org/abs/2402.02346v1
- Date: Sun, 4 Feb 2024 05:03:22 GMT
- Title: Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE
Distillation and Diffusion Probabilistic Feedback
- Authors: Xin Jin, Bohan Li, BAAO Xie, Wenyao Zhang, Jinming Liu, Ziqiang Li,
Tao Yang, Wenjun Zeng
- Abstract summary: Representation disentanglement may help AI fundamentally understand the real world and thus benefit both discrimination and generation tasks.
We propose a textbfCL-Disentanglement approach dubbed textbfCL-Dis.
Experiments demonstrate the superiority of CL-Dis on applications like real image manipulation and visual analysis.
- Score: 45.68054456449699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representation disentanglement may help AI fundamentally understand the real
world and thus benefit both discrimination and generation tasks. It currently
has at least three unresolved core issues: (i) heavy reliance on label
annotation and synthetic data -- causing poor generalization on natural
scenarios; (ii) heuristic/hand-craft disentangling constraints make it hard to
adaptively achieve an optimal training trade-off; (iii) lacking reasonable
evaluation metric, especially for the real label-free data. To address these
challenges, we propose a \textbf{C}losed-\textbf{L}oop unsupervised
representation \textbf{Dis}entanglement approach dubbed \textbf{CL-Dis}.
Specifically, we use diffusion-based autoencoder (Diff-AE) as a backbone while
resorting to $\beta$-VAE as a co-pilot to extract semantically disentangled
representations. The strong generation ability of diffusion model and the good
disentanglement ability of VAE model are complementary. To strengthen
disentangling, VAE-latent distillation and diffusion-wise feedback are
interconnected in a closed-loop system for a further mutual promotion. Then, a
self-supervised \textbf{Navigation} strategy is introduced to identify
interpretable semantic directions in the disentangled latent space. Finally, a
new metric based on content tracking is designed to evaluate the
disentanglement effect. Experiments demonstrate the superiority of CL-Dis on
applications like real image manipulation and visual analysis.
Related papers
- Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Disentangled Representation Learning with Transmitted Information
Bottleneck [73.0553263960709]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - How to train your VAE [0.0]
Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning.
This paper explores interpreting the Kullback-Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO)
The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term, and employs a PatchGAN discriminator to enhance texture realism.
arXiv Detail & Related papers (2023-09-22T19:52:28Z) - PDE+: Enhancing Generalization via PDE with Adaptive Distributional
Diffusion [66.95761172711073]
generalization of neural networks is a central challenge in machine learning.
We propose to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data.
We put this theoretical framework into practice as $textbfPDE+$ ($textbfPDE$ with $textbfA$daptive $textbfD$istributional $textbfD$iffusion)
arXiv Detail & Related papers (2023-05-25T08:23:26Z) - Uncertain Facial Expression Recognition via Multi-task Assisted
Correction [43.02119884581332]
We propose a novel method of multi-task assisted correction in addressing uncertain facial expression recognition called MTAC.
Specifically, a confidence estimation block and a weighted regularization module are applied to highlight solid samples and suppress uncertain samples in every batch.
Experiments on RAF-DB, AffectNet, and AffWild2 datasets demonstrate that the MTAC obtains substantial improvements over baselines when facing synthetic and real uncertainties.
arXiv Detail & Related papers (2022-12-14T10:28:08Z) - Encouraging Disentangled and Convex Representation with Controllable
Interpolation Regularization [15.725515910594725]
We focus on controllable disentangled representation learning (C-Dis-RL)
We propose a simple yet efficient method: Controllable Interpolation Regularization (CIR)
arXiv Detail & Related papers (2021-12-06T16:52:07Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.