Image Classification Using a Diffusion Model as a Pre-Training Model
- URL: http://arxiv.org/abs/2505.06890v1
- Date: Sun, 11 May 2025 08:03:18 GMT
- Title: Image Classification Using a Diffusion Model as a Pre-Training Model
- Authors: Kosuke Ukita, Ye Xiaolong, Tsuyoshi Okita,
- Abstract summary: We propose a diffusion model that integrates a representation-conditioning mechanism, where the representations derived from a Vision Transformer (ViT) are used to condition the internal process of a Transformer-based diffusion model.<n>We evaluate our method through a zero-shot classification task for hematoma detection in brain imaging. Compared to the strong contrastive learning baseline, DINOv2, our method achieves a notable improvement of +6.15% in accuracy and +13.60% in F1-score.
- Score: 3.1976901430982063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a diffusion model that integrates a representation-conditioning mechanism, where the representations derived from a Vision Transformer (ViT) are used to condition the internal process of a Transformer-based diffusion model. This approach enables representation-conditioned data generation, addressing the challenge of requiring large-scale labeled datasets by leveraging self-supervised learning on unlabeled data. We evaluate our method through a zero-shot classification task for hematoma detection in brain imaging. Compared to the strong contrastive learning baseline, DINOv2, our method achieves a notable improvement of +6.15% in accuracy and +13.60% in F1-score, demonstrating its effectiveness in image classification.
Related papers
- LEAF: Latent Diffusion with Efficient Encoder Distillation for Aligned Features in Medical Image Segmentation [2.529281336118734]
We propose LEAF, a medical image segmentation model grounded in latent diffusion models.<n>During the fine-tuning process, we replace the original noise prediction pattern with a direct prediction of the segmentation map.<n>We also employ a feature distillation method to align the hidden states of the convolutional layers with the features from a transformer-based vision encoder.
arXiv Detail & Related papers (2025-07-24T09:08:04Z) - DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks [79.50756148780928]
This paper studies the problem of leveraging pretrained diffusion models for performing discriminative tasks.<n>We extend the discriminative capability of pretrained frozen generative diffusion models from the classification task to the more complex object detection task, by "inverting" a pretrained layout-to-image diffusion model.
arXiv Detail & Related papers (2025-04-24T05:13:27Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.<n>We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.<n>The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification [8.364943466191933]
We propose the Anisotropic Diffusion Probabilistic Model (ADPM) for imbalanced image classification problems.
We use the data distribution to control the diffusion speed of different class samples during the forward process, effectively improving the classification accuracy of the denoiser in the reverse process.
Our results confirm that the anisotropic diffusion model significantly improves the classification accuracy of rare classes while maintaining the accuracy of head classes.
arXiv Detail & Related papers (2024-09-22T04:42:52Z) - Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model [0.10910416614141322]
Diffusion models are the state-of-the-art solution for generating in-silico images.<n>Appearance transfer diffusion models are designed for natural images.<n>In computational pathology, specifically in oncology, it is not straightforward to define which objects in an image should be classified as foreground and background.<n>We contribute to the applicability of appearance transfer models to diffusion-stained images by modifying the appearance transfer guidance to alternate between class-specific AdaIN feature statistics matchings.
arXiv Detail & Related papers (2024-07-16T12:36:26Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - Class-Prototype Conditional Diffusion Model with Gradient Projection for Continual Learning [20.175586324567025]
Mitigating catastrophic forgetting is a key hurdle in continual learning.
A major issue is the deterioration in the quality of generated data compared to the original.
We propose a GR-based approach for continual learning that enhances image quality in generators.
arXiv Detail & Related papers (2023-12-10T17:39:42Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - DiffMix: Diffusion Model-based Data Synthesis for Nuclei Segmentation
and Classification in Imbalanced Pathology Image Datasets [8.590026259176806]
We propose a realistic data synthesis method using a diffusion model.
We generate two types of virtual patches to enlarge the training data distribution.
We use a semantic-label-conditioned diffusion model to generate realistic and high-quality image samples.
arXiv Detail & Related papers (2023-06-25T05:31:08Z) - Denoising Diffusion Autoencoders are Unified Self-supervised Learners [58.194184241363175]
This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners.
DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders.
Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet.
arXiv Detail & Related papers (2023-03-17T04:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.