Do text-free diffusion models learn discriminative visual
representations?
- URL: http://arxiv.org/abs/2311.17921v2
- Date: Thu, 30 Nov 2023 03:02:58 GMT
- Title: Do text-free diffusion models learn discriminative visual
representations?
- Authors: Soumik Mukhopadhyay and Matthew Gwilliam and Yosuke Yamaguchi and
Vatsal Agarwal and Namitha Padmanabhan and Archana Swaminathan and Tianyi
Zhou and Abhinav Shrivastava
- Abstract summary: We explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously.
We develop diffusion models, a state-of-the-art method for generative tasks, as a prime candidate.
We find that diffusion models are better than GANs, and, with our fusion and feedback mechanisms, can compete with state-of-the-art unsupervised image representation learning methods for discriminative tasks.
- Score: 43.05419164830729
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While many unsupervised learning models focus on one family of tasks, either
generative or discriminative, we explore the possibility of a unified
representation learner: a model which addresses both families of tasks
simultaneously. We identify diffusion models, a state-of-the-art method for
generative tasks, as a prime candidate. Such models involve training a U-Net to
iteratively predict and remove noise, and the resulting model can synthesize
high-fidelity, diverse, novel images. We find that the intermediate feature
maps of the U-Net are diverse, discriminative feature representations. We
propose a novel attention mechanism for pooling feature maps and further
leverage this mechanism as DifFormer, a transformer feature fusion of features
from different diffusion U-Net blocks and noise steps. We also develop DifFeed,
a novel feedback mechanism tailored to diffusion. We find that diffusion models
are better than GANs, and, with our fusion and feedback mechanisms, can compete
with state-of-the-art unsupervised image representation learning methods for
discriminative tasks - image classification with full and semi-supervision,
transfer for fine-grained classification, object detection and segmentation,
and semantic segmentation. Our project website
(https://mgwillia.github.io/diffssl/) and code
(https://github.com/soumik-kanad/diffssl) are available publicly.
Related papers
- Diffusion Imitation from Observation [4.205946699819021]
adversarial imitation learning approaches learn a generator agent policy to produce state transitions that are indistinguishable to a discriminator.
Motivated by the recent success of diffusion models in generative modeling, we propose to integrate a diffusion model into the adversarial imitation learning from observation framework.
arXiv Detail & Related papers (2024-10-07T18:49:55Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Diffusion Models Beat GANs on Image Classification [37.70821298392606]
Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc.
We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification.
We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods for classification tasks.
arXiv Detail & Related papers (2023-07-17T17:59:40Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - Diffusion Visual Counterfactual Explanations [51.077318228247925]
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image.
Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts.
In this paper, we overcome this by generating Visual Diffusion Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers.
arXiv Detail & Related papers (2022-10-21T09:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.