Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models
- URL: http://arxiv.org/abs/2601.22057v1
- Date: Thu, 29 Jan 2026 17:57:06 GMT
- Title: Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models
- Authors: Archer Wang, Emile Anand, Yilun Du, Marin Soljačić,
- Abstract summary: Decomposing complex data into factorized representations can reveal reusable components and enable new samples via component recombination.<n>We introduce an adversarial training signal via a discriminator trained to distinguish between single-source samples and those generated by recombining factors across sources.<n>Our method outperforms implementations of prior baselines on CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D, achieving lower FID scores and better disentanglement as measured by MIG and MCC.
- Score: 41.14254731598591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decomposing complex data into factorized representations can reveal reusable components and enable synthesizing new samples via component recombination. We investigate this in the context of diffusion-based models that learn factorized latent spaces without factor-level supervision. In images, factors can capture background, illumination, and object attributes; in robotic videos, they can capture reusable motion components. To improve both latent factor discovery and quality of compositional generation, we introduce an adversarial training signal via a discriminator trained to distinguish between single-source samples and those generated by recombining factors across sources. By optimizing the generator to fool this discriminator, we encourage physical and semantic consistency in the resulting recombinations. Our method outperforms implementations of prior baselines on CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D, achieving lower FID scores and better disentanglement as measured by MIG and MCC. Furthermore, we demonstrate a novel application to robotic video trajectories: by recombining learned action components, we generate diverse sequences that significantly increase state-space coverage for exploration on the LIBERO benchmark.
Related papers
- Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution [76.66229730098759]
In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models.<n>We propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution.<n>We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert.
arXiv Detail & Related papers (2025-11-20T04:11:44Z) - Enhancing Diffusion Face Generation with Contrastive Embeddings and SegFormer Guidance [0.0]
We present a benchmark of diffusion models for human face generation on a small-scale CelebAMask-HQ dataset.<n>Our study compares UNet and DiT architectures for unconditional generation and explores LoRA-based fine-tuning of pretrained Stable Diffusion models.
arXiv Detail & Related papers (2025-08-13T14:27:47Z) - Diffusion-based Layer-wise Semantic Reconstruction for Unsupervised Out-of-Distribution Detection [30.02748131967826]
Unsupervised out-of-distribution (OOD) detection aims to identify out-of-domain data by learning only from unlabeled In-Distribution (ID) training samples.
Current reconstruction-based methods provide a good alternative approach by measuring the reconstruction error between the input and its corresponding generative counterpart in the pixel/feature space.
We propose the diffusion-based layer-wise semantic reconstruction approach for unsupervised OOD detection.
arXiv Detail & Related papers (2024-11-16T04:54:07Z) - Specularity Factorization for Low-Light Enhancement [2.7961648901433134]
We present a new additive image factorization technique that treats images to be composed of multiple latent components.
Our model-driven em RSFNet estimates these factors by unrolling the optimization into network layers.
The resultant factors are interpretable by design and can be fused for different image enhancement tasks via a network or combined directly by the user.
arXiv Detail & Related papers (2024-04-02T14:41:42Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - GLOWin: A Flow-based Invertible Generative Framework for Learning
Disentangled Feature Representations in Medical Images [40.58581577183134]
Flow-based generative models have been proposed to generate realistic images by directly modeling the data distribution with invertible functions.
We propose a new flow-based generative model framework, named GLOWin, that is end-to-end invertible and able to learn disentangled representations.
arXiv Detail & Related papers (2021-03-19T15:47:01Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z) - Learning Hybrid Representation by Robust Dictionary Learning in
Factorized Compressed Space [84.37923242430999]
We investigate the robust dictionary learning (DL) to discover the hybrid salient low-rank and sparse representation in a factorized compressed space.
A Joint Robust Factorization and Projective Dictionary Learning (J-RFDL) model is presented.
arXiv Detail & Related papers (2019-12-26T06:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.