Advancing Image Classification with Discrete Diffusion Classification Modeling
- URL: http://arxiv.org/abs/2511.20263v1
- Date: Tue, 25 Nov 2025 12:42:26 GMT
- Title: Advancing Image Classification with Discrete Diffusion Classification Modeling
- Authors: Omer Belhasin, Shelly Golan, Ran El-Yaniv, Michael Elad,
- Abstract summary: DiDiCM is a novel framework that leverages a diffusion-based procedure to model the posterior distribution of class labels conditioned on the input image.<n>We show that a few diffusion achieve higher classification accuracy on the ImageNet dataset compared to baselines, with accuracy gains increasing as the task becomes more challenging.
- Score: 25.92509040578864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image classification is a well-studied task in computer vision, and yet it remains challenging under high-uncertainty conditions, such as when input images are corrupted or training data are limited. Conventional classification approaches typically train models to directly predict class labels from input images, but this might lead to suboptimal performance in such scenarios. To address this issue, we propose Discrete Diffusion Classification Modeling (DiDiCM), a novel framework that leverages a diffusion-based procedure to model the posterior distribution of class labels conditioned on the input image. DiDiCM supports diffusion-based predictions either on class probabilities or on discrete class labels, providing flexibility in computation and memory trade-offs. We conduct a comprehensive empirical study demonstrating the superior performance of DiDiCM over standard classifiers, showing that a few diffusion iterations achieve higher classification accuracy on the ImageNet dataset compared to baselines, with accuracy gains increasing as the task becomes more challenging. We release our code at https://github.com/omerb01/didicm .
Related papers
- DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks [79.50756148780928]
This paper studies the problem of leveraging pretrained diffusion models for performing discriminative tasks.<n>We extend the discriminative capability of pretrained frozen generative diffusion models from the classification task to the more complex object detection task, by "inverting" a pretrained layout-to-image diffusion model.
arXiv Detail & Related papers (2025-04-24T05:13:27Z) - CLIP Adaptation by Intra-modal Overlap Reduction [1.2277343096128712]
We analyse the intra-modal overlap in image space in terms of embedding representation.
We train a lightweight adapter on a generic set of samples from the Google Open Images dataset.
arXiv Detail & Related papers (2024-09-17T16:40:58Z) - Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Diffusion Models Beat GANs on Image Classification [37.70821298392606]
Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc.
We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification.
We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods for classification tasks.
arXiv Detail & Related papers (2023-07-17T17:59:40Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [102.88033622546251]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.<n>Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.<n>We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - Cap2Aug: Caption guided Image to Image data Augmentation [41.53127698828463]
Cap2Aug is an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts.
We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model.
This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples.
arXiv Detail & Related papers (2022-12-11T04:37:43Z) - Improving Explainability of Image Classification in Scenarios with Class
Overlap: Application to COVID-19 and Pneumonia [7.372797734096181]
Trust in predictions made by machine learning models is increased if the model generalizes well on previously unseen samples.
We propose a method that enhances the explainability of image classifications through better localization by mitigating the model uncertainty induced by class overlap.
Our method is particularly promising in real-world class overlap scenarios, such as COVID-19 and pneumonia, where expertly labeled data for localization is not readily available.
arXiv Detail & Related papers (2020-08-06T20:47:36Z) - I Am Going MAD: Maximum Discrepancy Competition for Comparing
Classifiers Adaptively [135.7695909882746]
We name the MAximum Discrepancy (MAD) competition.
We adaptively sample a small test set from an arbitrarily large corpus of unlabeled images.
Human labeling on the resulting model-dependent image sets reveals the relative performance of the competing classifiers.
arXiv Detail & Related papers (2020-02-25T03:32:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.