DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery
- URL: http://arxiv.org/abs/2303.09813v1
- Date: Fri, 17 Mar 2023 07:47:55 GMT
- Title: DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery
- Authors: Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya
Zhang, Yanfeng Wang
- Abstract summary: DiffusionSeg is a synthesis-exploitation framework containing two-stage strategies.
We synthesize abundant images, and propose a novel training-free AttentionCut to obtain masks in the first stage.
In the second exploitation stage, to bridge the structural gap, we use the inversion technique, to map the given image back to diffusion features.
- Score: 20.787180028571694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from a large corpus of data, pre-trained models have achieved
impressive progress nowadays. As popular generative pre-training, diffusion
models capture both low-level visual knowledge and high-level semantic
relations. In this paper, we propose to exploit such knowledgeable diffusion
models for mainstream discriminative tasks, i.e., unsupervised object
discovery: saliency segmentation and object localization. However, the
challenges exist as there is one structural difference between generative and
discriminative models, which limits the direct use. Besides, the lack of
explicitly labeled data significantly limits performance in unsupervised
settings. To tackle these issues, we introduce DiffusionSeg, one novel
synthesis-exploitation framework containing two-stage strategies. To alleviate
data insufficiency, we synthesize abundant images, and propose a novel
training-free AttentionCut to obtain masks in the first synthesis stage. In the
second exploitation stage, to bridge the structural gap, we use the inversion
technique, to map the given image back to diffusion features. These features
can be directly used by downstream architectures. Extensive experiments and
ablation studies demonstrate the superiority of adapting diffusion for
unsupervised object discovery.
Related papers
- MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion [14.907473847787541]
We propose Masked Diffusion Conditional (MacDiff) as a unified framework for human skeleton modeling.
For the first time, we leverage diffusion models as effective skeleton representation learners.
MacDiff achieves state-of-the-art performance on representation learning benchmarks while maintaining the competence for generative tasks.
arXiv Detail & Related papers (2024-09-16T17:06:10Z) - Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data.
We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias.
This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation [48.25619775814776]
This paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation.
DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding.
Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets.
arXiv Detail & Related papers (2023-09-10T13:28:46Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - Object-Centric Slot Diffusion [30.722428924152382]
We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes.
We demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders.
We also conduct a preliminary investigation into the integration of pre-trained diffusion models in LSD.
arXiv Detail & Related papers (2023-03-20T02:40:16Z) - Empowering Diffusion Models on the Embedding Space for Text Generation [38.664533078347304]
We study the optimization challenges encountered with both the embedding space and the denoising model.
Data distribution is learnable for embeddings, which may lead to the collapse of the embedding space and unstable training.
Based on the above analysis, we propose Difformer, an embedding diffusion model based on Transformer.
arXiv Detail & Related papers (2022-12-19T12:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.