Diffusion Models Beat GANs on Image Classification
- URL: http://arxiv.org/abs/2307.08702v1
- Date: Mon, 17 Jul 2023 17:59:40 GMT
- Title: Diffusion Models Beat GANs on Image Classification
- Authors: Soumik Mukhopadhyay, Matthew Gwilliam, Vatsal Agarwal, Namitha
Padmanabhan, Archana Swaminathan, Srinidhi Hegde, Tianyi Zhou, Abhinav
Shrivastava
- Abstract summary: Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc.
We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification.
We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods for classification tasks.
- Score: 37.70821298392606
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While many unsupervised learning models focus on one family of tasks, either
generative or discriminative, we explore the possibility of a unified
representation learner: a model which uses a single pre-training stage to
address both families of tasks simultaneously. We identify diffusion models as
a prime candidate. Diffusion models have risen to prominence as a
state-of-the-art method for image generation, denoising, inpainting,
super-resolution, manipulation, etc. Such models involve training a U-Net to
iteratively predict and remove noise, and the resulting model can synthesize
high fidelity, diverse, novel images. The U-Net architecture, as a
convolution-based architecture, generates a diverse set of feature
representations in the form of intermediate feature maps. We present our
findings that these embeddings are useful beyond the noise prediction task, as
they contain discriminative information and can also be leveraged for
classification. We explore optimal methods for extracting and using these
embeddings for classification tasks, demonstrating promising results on the
ImageNet classification task. We find that with careful feature selection and
pooling, diffusion models outperform comparable generative-discriminative
methods such as BigBiGAN for classification tasks. We investigate diffusion
models in the transfer learning regime, examining their performance on several
fine-grained visual classification datasets. We compare these embeddings to
those generated by competing architectures and pre-trainings for classification
tasks.
Related papers
- Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection [28.442470704073767]
This paper proposes a novel hierarchical visual category modeling scheme to separate out-of-distribution data from in-distribution data.
We conduct experiments on seven popular benchmarks, including CIFAR, iNaturalist, SUN, Places, Textures, ImageNet-O, and OpenImage-O.
Our visual representation has a competitive performance when compared with features learned by classical methods.
arXiv Detail & Related papers (2024-08-28T07:05:46Z) - Investigating Self-Supervised Methods for Label-Efficient Learning [27.029542823306866]
We study different self supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling for their low-shot capabilities.
We introduce a framework involving both mask image modelling and clustering as pretext tasks, which performs better across all low-shot downstream tasks.
When testing the model on full scale datasets, we show performance gains in multi-class classification, multi-label classification and semantic segmentation.
arXiv Detail & Related papers (2024-06-25T10:56:03Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Do text-free diffusion models learn discriminative visual representations? [39.78043004824034]
We explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously.
We develop diffusion models, a state-of-the-art method for generative tasks, as a prime candidate.
We find that diffusion models are better than GANs, and, with our fusion and feedback mechanisms, can compete with state-of-the-art unsupervised image representation learning methods for discriminative tasks.
arXiv Detail & Related papers (2023-11-29T18:59:59Z) - Diffusion-TTA: Test-time Adaptation of Discriminative Models via
Generative Feedback [97.0874638345205]
generative models can be great test-time adapters for discriminative models.
Our method, Diffusion-TTA, adapts pre-trained discriminative models to each unlabelled example in the test set.
We show Diffusion-TTA significantly enhances the accuracy of various large-scale pre-trained discriminative models.
arXiv Detail & Related papers (2023-11-27T18:59:53Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - Semi-Supervised Few-Shot Classification with Deep Invertible Hybrid
Models [4.189643331553922]
We propose a deep invertible hybrid model which integrates discriminative and generative learning at a latent space level for semi-supervised few-shot classification.
Our main originality lies in our integration of these components at a latent space level, which is effective in preventing overfitting.
arXiv Detail & Related papers (2021-05-22T05:55:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.