Your Diffusion Model is Secretly a Zero-Shot Classifier
- URL: http://arxiv.org/abs/2303.16203v3
- Date: Wed, 13 Sep 2023 01:16:45 GMT
- Title: Your Diffusion Model is Secretly a Zero-Shot Classifier
- Authors: Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak
Pathak
- Abstract summary: We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
- Score: 90.40799216880342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent wave of large-scale text-to-image diffusion models has
dramatically increased our text-based image generation abilities. These models
can generate realistic images for a staggering variety of prompts and exhibit
impressive compositional generalization abilities. Almost all use cases thus
far have solely focused on sampling; however, diffusion models can also provide
conditional density estimates, which are useful for tasks beyond image
generation. In this paper, we show that the density estimates from large-scale
text-to-image diffusion models like Stable Diffusion can be leveraged to
perform zero-shot classification without any additional training. Our
generative approach to classification, which we call Diffusion Classifier,
attains strong results on a variety of benchmarks and outperforms alternative
methods of extracting knowledge from diffusion models. Although a gap remains
between generative and discriminative approaches on zero-shot recognition
tasks, our diffusion-based approach has significantly stronger multimodal
compositional reasoning ability than competing discriminative approaches.
Finally, we use Diffusion Classifier to extract standard classifiers from
class-conditional diffusion models trained on ImageNet. Our models achieve
strong classification performance using only weak augmentations and exhibit
qualitatively better "effective robustness" to distribution shift. Overall, our
results are a step toward using generative over discriminative models for
downstream tasks. Results and visualizations at
https://diffusion-classifier.github.io/
Related papers
- Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs.
We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL)
We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Stable Diffusion for Data Augmentation in COCO and Weed Datasets [5.81198182644659]
This study utilized seven common categories and three widespread weed species to evaluate the efficiency of a stable diffusion model.
Three techniques (i.e., Image-to-image translation, Dreambooth, and ControlNet) based on stable diffusion were leveraged for image generation with different focuses.
Then, classification and detection tasks were conducted based on these synthetic images, whose performance was compared to the models trained on original images.
arXiv Detail & Related papers (2023-12-07T02:23:32Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners [88.07317175639226]
We propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners.
Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information.
arXiv Detail & Related papers (2023-05-18T05:41:36Z) - DIRE for Diffusion-Generated Image Detection [128.95822613047298]
We propose a novel representation called DIffusion Reconstruction Error (DIRE)
DIRE measures the error between an input image and its reconstruction counterpart by a pre-trained diffusion model.
It provides a hint that DIRE can serve as a bridge to distinguish generated and real images.
arXiv Detail & Related papers (2023-03-16T13:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.