Expanding the Role of Diffusion Models for Robust Classifier Training
- URL: http://arxiv.org/abs/2602.19931v1
- Date: Mon, 23 Feb 2026 15:06:52 GMT
- Title: Expanding the Role of Diffusion Models for Robust Classifier Training
- Authors: Pin-Han Huang, Shang-Tse Chen, Hsuan-Tien Lin,
- Abstract summary: We show that diffusion models offer representations that are both diverse and partially robust.<n>Our representation analysis indicates that incorporating diffusion models into adversarial training encourages more disentangled features.<n>Experiments on CIFAR-10, CIFAR-100, and ImageNet validate these findings.
- Score: 14.409018677904571
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Incorporating diffusion-generated synthetic data into adversarial training (AT) has been shown to substantially improve the training of robust image classifiers. In this work, we extend the role of diffusion models beyond merely generating synthetic data, examining whether their internal representations, which encode meaningful features of the data, can provide additional benefits for robust classifier training. Through systematic experiments, we show that diffusion models offer representations that are both diverse and partially robust, and that explicitly incorporating diffusion representations as an auxiliary learning signal during AT consistently improves robustness across settings. Furthermore, our representation analysis indicates that incorporating diffusion models into AT encourages more disentangled features, while diffusion representations and diffusion-generated synthetic data play complementary roles in shaping representations. Experiments on CIFAR-10, CIFAR-100, and ImageNet validate these findings, demonstrating the effectiveness of jointly leveraging diffusion representations and synthetic data within AT.
Related papers
- Disentangled representations via score-based variational autoencoders [21.955536401578616]
We present the Score-based Autoencoder for Multiscale Inference (SAMI)<n>SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process.<n>It can extract useful representations from pre-trained diffusion models with minimal additional training.
arXiv Detail & Related papers (2025-12-18T23:42:10Z) - Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing [25.138589492384654]
We propose a Diffusion Latent Inspired network for Image Dehazing, dubbed DiffLI$2$D.<n>We first reveal that the semantic latent space of pre-trained diffusion models can represent the content and haze characteristics of images.<n>We integrate the diffusion latent representations at different time-steps into a delicately designed dehazing network to provide instructions for image dehazing.
arXiv Detail & Related papers (2025-09-24T13:11:37Z) - FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning [12.366529890744822]
Federated learning aims at training models collaboratively across participants while protecting privacy.<n>One major challenge for this paradigm is the data heterogeneity issue, where biased data preferences across multiple clients, harming the model's consistency and performance.<n>In this paper, we first introduce powerful diffusion models into a novel Federated paradigm with Diffusion Representation Collaboration (FedDifRC)<n>FedDifRC is a text-driven diffusion contrasting and noise-driven diffusion regularization, aiming to provide abundant class-related semantic information and consistent convergence signals.
arXiv Detail & Related papers (2025-07-09T01:57:57Z) - DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning [53.27049077100897]
generative pre-training has been shown to yield discriminative representations, paving the way towards unified visual generation and understanding.<n>This work introduces self-conditioning, a mechanism that internally leverages the rich semantics inherent in denoising network to guide its own decoding layers.<n>Results are compelling: our method boosts both generation FID and recognition accuracy with 1% computational overhead and generalizes across diverse diffusion architectures.
arXiv Detail & Related papers (2025-05-16T08:47:16Z) - Automated Learning of Semantic Embedding Representations for Diffusion Models [1.688134675717698]
We employ a multi-level denoising autoencoder framework to expand the representation capacity of denoising diffusion models.<n>Our work justifies that DDMs are not only suitable for generative tasks, but also potentially advantageous for general-purpose deep learning applications.
arXiv Detail & Related papers (2025-05-09T02:10:46Z) - FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution [48.88184541515326]
We propose a simple and effective method, named FaithDiff, to fully harness the power of latent diffusion models (LDMs) for faithful image SR.<n>In contrast to existing diffusion-based SR methods that freeze the diffusion model pre-trained on high-quality images, we propose to unleash the diffusion prior to identify useful information and recover faithful structures.
arXiv Detail & Related papers (2024-11-27T23:58:03Z) - How Diffusion Models Learn to Factorize and Compose [14.161975556325796]
Diffusion models are capable of generating photo-realistic images that combine elements which likely do not appear together in the training set.
We investigate whether and when diffusion models learn semantically meaningful and factorized representations of composable features.
arXiv Detail & Related papers (2024-08-23T17:59:03Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Boosting Human-Object Interaction Detection with Text-to-Image Diffusion
Model [22.31860516617302]
We introduce DiffHOI, a novel HOI detection scheme grounded on a pre-trained text-image diffusion model.
To fill in the gaps of HOI datasets, we propose SynHOI, a class-balance, large-scale, and high-diversity synthetic dataset.
Experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i.e., 41.50 mAP) and zero-shot detection.
arXiv Detail & Related papers (2023-05-20T17:59:23Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.