Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench
- URL: http://arxiv.org/abs/2507.03015v2
- Date: Thu, 10 Jul 2025 09:41:29 GMT
- Title: Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench
- Authors: Felix Friedrich, Thiemo Ganesha Welsch, Manuel Brack, Patrick Schramowski, Kristian Kersting,
- Abstract summary: Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification.<n>This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation.
- Score: 26.148022772521493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification where demographic attributes are modified even when explicitly specified in prompts. This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation. Through systematic evaluation of state-of-the-art T2I models, we find that while most models exhibit limited diversity, many diversification approaches overcorrect by inappropriately altering contextually-specified attributes. We demonstrate that context-aware methods, particularly LLM-guided FairDiffusion and prompt rewriting, can already effectively address under-diversity while avoiding over-diversification, achieving a better balance between representation and semantic fidelity.
Related papers
- Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models [29.501859416167385]
Current text-to-image (T2I) models often fail to account for diverse human experiences, leading to misaligned systems.<n>We advocate for pluralistic alignment, where an AI understands and is steerable towards diverse, and often conflicting, human values.
arXiv Detail & Related papers (2025-07-15T21:02:35Z) - DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models [11.080727606381524]
We introduce the Does-it/Can-it framework, DIM-CIM, a reference-free measurement of default-mode diversity.<n>We find that widely-used models improve in generalization at the cost of default-mode diversity when scaling from 1.5B to 8.1B parameters.<n>We also use DIMCIM to evaluate the training data of a T2I model and observe a correlation of 0.85 between diversity in training images and default-mode diversity.
arXiv Detail & Related papers (2025-06-05T14:53:34Z) - Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z) - Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting [9.116108409344177]
The source-free cross-domain few-shot learning task aims to transfer pretrained models to target domains utilizing minimal samples.<n>We propose the SeGD-VPT framework, which is divided into two phases.<n>The first step aims to increase feature diversity by adding diversity prompts to each support sample, thereby generating varying input and enhancing sample diversity.
arXiv Detail & Related papers (2024-12-01T11:00:38Z) - GRADE: Quantifying Sample Diversity in Text-to-Image Models [66.12068246962762]
GRADE is an automatic method for quantifying sample diversity in text-to-image models.<n>We use GRADE to measure the diversity of 12 models over a total of 720K images.
arXiv Detail & Related papers (2024-10-29T23:10:28Z) - Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Minority-Focused Text-to-Image Generation via Prompt Optimization [57.319845580050924]
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models.<n>We develop an online prompt optimization framework that encourages emergence of desired properties during inference.<n>We then tailor this generic prompt distributions into a specialized solver that promotes generation of minority features.
arXiv Detail & Related papers (2024-10-10T11:56:09Z) - DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World [6.816521410643928]
This paper proposes a new general method, named Diversity Adaptive Test-Time Adaptation (DATTA), aimed at improving Quality of Experience (QoE)
It features three key components: Diversity Discrimination (DD) to assess batch diversity, Diversity Adaptive Batch Normalization (DABN) to tailor normalization methods based on DD insights, and Diversity Adaptive Fine-Tuning (DAFT) to selectively fine-tune the model.
Experimental results show that our method achieves up to a 21% increase in accuracy compared to state-of-the-art methodologies.
arXiv Detail & Related papers (2024-08-15T09:50:11Z) - Discriminative Probing and Tuning for Text-to-Image Generation [129.39674951747412]
Text-to-image generation (T2I) often faces text-image misalignment problems such as relation confusion in generated images.
We propose bolstering the discriminative abilities of T2I models to achieve more precise text-to-image alignment for generation.
We present a discriminative adapter built on T2I models to probe their discriminative abilities on two representative tasks and leverage discriminative fine-tuning to improve their text-image alignment.
arXiv Detail & Related papers (2024-03-07T08:37:33Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - IFDID: Information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG [5.771099867942164]
This paper presents Information Filter upon Diversity-Improved Decoding (IFDID) to obtain the tradeoff between diversity and faithfulness.
Our approach achieves a 1.24 higher ROUGE score describing faithfulness as well as higher diversity represented by 62.5% higher upon Dist-2 than traditional approaches.
arXiv Detail & Related papers (2022-10-25T08:14:20Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.