Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error
- URL: http://arxiv.org/abs/2602.14682v1
- Date: Mon, 16 Feb 2026 12:15:34 GMT
- Title: Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error
- Authors: Farzan Farnia, Mohammad Jalali, Azim Ospanov,
- Abstract summary: We compare the diversity of samples generated by state-of-the-art models with that of test samples drawn from the target data distribution.<n>We find that test data consistently attains substantially higher Vendi and RKE diversity scores than the generated samples.
- Score: 19.594704501292778
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep generative models have achieved great success in producing high-quality samples, making them a central tool across machine learning applications. Beyond sample quality, an important yet less systematically studied question is whether trained generative models faithfully capture the diversity of the underlying data distribution. In this work, we address this question by directly comparing the diversity of samples generated by state-of-the-art models with that of test samples drawn from the target data distribution, using recently proposed reference-free entropy-based diversity scores, Vendi and RKE. Across multiple benchmark datasets, we find that test data consistently attains substantially higher Vendi and RKE diversity scores than the generated samples, suggesting a systematic downward diversity bias in modern generative models. To understand the origin of this bias, we analyze the finite-sample behavior of entropy-based diversity scores and show that their expected values increase with sample size, implying that diversity estimated from finite training sets could inherently underestimate the diversity of the true distribution. As a result, optimizing the generators to minimize divergence to empirical data distributions would induce a loss of diversity. Finally, we discuss potential diversity-aware regularization and guidance strategies based on Vendi and RKE as principled directions for mitigating this bias, and provide empirical evidence suggesting their potential to improve the results.
Related papers
- Importance-Weighted Non-IID Sampling for Flow Matching Models [5.995277983968318]
We propose an importance-weighted non-IID sampling framework that jointly draws multiple samples to cover diverse, salient regions of a flow's distribution.<n>To balance diversity and quality, we introduce a score-based regularization for the diversity mechanism.<n>Our method produces diverse, high-quality samples and accurate estimates of both importance weights and expectations.
arXiv Detail & Related papers (2025-11-21T22:05:56Z) - Learning Robust Diffusion Models from Imprecise Supervision [75.53546939251146]
DMIS is a unified framework for training robust Conditional Diffusion Models from Imprecise Supervision.<n>Our framework is derived from likelihood and decomposes the objective into generative and classification components.<n>Experiments on diverse forms of imprecise supervision, covering tasks covering image generation, weakly supervised learning, and dataset condensation demonstrate that DMIS consistently produces high-quality and class-discriminative samples.
arXiv Detail & Related papers (2025-10-03T14:00:32Z) - Diverse Rare Sample Generation with Pretrained GANs [24.227852798611025]
This study proposes a novel approach for generating diverse rare samples from high-resolution image datasets with pretrained GANs.<n>Our method employs gradient-based optimization of latent vectors within a multi-objective framework and utilizes normalizing flows for density estimation on the feature space.<n>This enables the generation of diverse rare images, with controllable parameters for rarity, diversity, and similarity to a reference image.
arXiv Detail & Related papers (2024-12-27T09:10:30Z) - Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density [70.14884528360199]
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with enhanced fidelity or increased diversity.
Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density.
arXiv Detail & Related papers (2024-07-11T16:46:04Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative
Adversarial Networks [5.754152248672317]
We propose a method to diagnose and emphasize underrepresented samples during training of a Generative Adversarial Networks (GAN)
Based on the observation that the underrepresented samples have a high average discrepancy or high variability in discrepancy, we propose a method to emphasize those samples.
Our experimental results demonstrate that the proposed method improves GAN performance on various datasets.
arXiv Detail & Related papers (2021-02-24T02:31:50Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Reinforced Data Sampling for Model Diversification [15.547681142342846]
This paper proposes a new Reinforced Data Sampling (RDS) method to learn how to sample data adequately.
We formulate the optimisation problem of model diversification $delta-div$ in data sampling to maximise learning potentials and optimum allocation by injecting model diversity.
Our results suggest that the trainable sampling for model diversification is useful for competition organisers, researchers, or even starters to pursue full potentials of various machine learning tasks.
arXiv Detail & Related papers (2020-06-12T11:46:13Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.