Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification
- URL: http://arxiv.org/abs/2511.08711v1
- Date: Thu, 13 Nov 2025 01:03:26 GMT
- Title: Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification
- Authors: Abhipsa Basu, Aviral Gupta, Abhijnya Bhat, R. Venkatesh Babu,
- Abstract summary: Image classification systems often inherit biases from uneven group representation in training data.<n>In this work, we explore multiple diffusion-finetuning techniques, e.g., LoRA and DreamBooth, to generate images that more accurately represent each training group.
- Score: 25.474389970409067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image classification systems often inherit biases from uneven group representation in training data. For example, in face datasets for hair color classification, blond hair may be disproportionately associated with females, reinforcing stereotypes. A recent approach leverages the Stable Diffusion model to generate balanced training data, but these models often struggle to preserve the original data distribution. In this work, we explore multiple diffusion-finetuning techniques, e.g., LoRA and DreamBooth, to generate images that more accurately represent each training group by learning directly from their samples. Additionally, in order to prevent a single DreamBooth model from being overwhelmed by excessive intra-group variations, we explore a technique of clustering images within each group and train a DreamBooth model per cluster. These models are then used to generate group-balanced data for pretraining, followed by fine-tuning on real data. Experiments on multiple benchmarks demonstrate that the studied finetuning approaches outperform vanilla Stable Diffusion on average and achieve results comparable to SOTA debiasing techniques like Group-DRO, while surpassing them as the dataset bias severity increases.
Related papers
- GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning [83.56510119503267]
Group-wise attribution is counterfactual: how would a model's behavior on a generated sample change if a group were absent from training?<n>We propose GUDA (Group Unlearning-based Data Attribution) for diffusion models, which approximates each counterfactual model by applying machine unlearning to a shared full-data model instead of training from scratch.
arXiv Detail & Related papers (2026-01-30T07:10:59Z) - Learning Robust Diffusion Models from Imprecise Supervision [75.53546939251146]
DMIS is a unified framework for training robust Conditional Diffusion Models from Imprecise Supervision.<n>Our framework is derived from likelihood and decomposes the objective into generative and classification components.<n>Experiments on diverse forms of imprecise supervision, covering tasks covering image generation, weakly supervised learning, and dataset condensation demonstrate that DMIS consistently produces high-quality and class-discriminative samples.
arXiv Detail & Related papers (2025-10-03T14:00:32Z) - Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models [9.801159950963306]
We introduce DiffuBias, a novel pipeline for text-to-image generation that enhances classifier robustness by generating bias-conflict samples.
DrouBias is the first approach leveraging a stable diffusion model to generate bias-conflict samples in debiasing tasks.
Our comprehensive experimental evaluations demonstrate that DiffuBias achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2024-11-25T04:11:16Z) - DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training.<n>It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models.<n>While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - Fair GANs through model rebalancing for extremely imbalanced class
distributions [5.463417677777276]
We present an approach to construct an unbiased generative adversarial network (GAN) from an existing biased GAN.
We show results for the StyleGAN2 models while training on the Flickr Faces High Quality (FFHQ) dataset for racial fairness.
We further validate our approach by applying it to an imbalanced CIFAR10 dataset which is also twice as large.
arXiv Detail & Related papers (2023-08-16T19:20:06Z) - Unbiased Image Synthesis via Manifold Guidance in Diffusion Models [9.531220208352252]
Diffusion Models often inadvertently favor certain data attributes, undermining the diversity of generated images.
We propose a plug-and-play method named Manifold Sampling Guidance, which is also the first unsupervised method to mitigate bias issue in DDPMs.
arXiv Detail & Related papers (2023-07-17T02:03:17Z) - GSURE-Based Diffusion Model Training with Corrupted Data [35.56267114494076]
We propose a novel training technique for generative diffusion models based only on corrupted data.
We demonstrate our technique on face images as well as Magnetic Resonance Imaging (MRI)
arXiv Detail & Related papers (2023-05-22T15:27:20Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.