Related papers: Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench

Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench

URL: http://arxiv.org/abs/2507.03015v2
Date: Thu, 10 Jul 2025 09:41:29 GMT
Title: Beyond Overcorrection: Evaluating Diversity in T2I Models with DivBench
Authors: Felix Friedrich, Thiemo Ganesha Welsch, Manuel Brack, Patrick Schramowski, Kristian Kersting,
Abstract summary: Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification.<n>This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation.
Score: 26.148022772521493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current diversification strategies for text-to-image (T2I) models often ignore contextual appropriateness, leading to over-diversification where demographic attributes are modified even when explicitly specified in prompts. This paper introduces DIVBENCH, a benchmark and evaluation framework for measuring both under- and over-diversification in T2I generation. Through systematic evaluation of state-of-the-art T2I models, we find that while most models exhibit limited diversity, many diversification approaches overcorrect by inappropriately altering contextually-specified attributes. We demonstrate that context-aware methods, particularly LLM-guided FairDiffusion and prompt rewriting, can already effectively address under-diversity while avoiding over-diversification, achieving a better balance between representation and semantic fidelity.

Related papers

Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models [29.501859416167385]
Current text-to-image (T2I) models often fail to account for diverse human experiences, leading to misaligned systems.<n>We advocate for pluralistic alignment, where an AI understands and is steerable towards diverse, and often conflicting, human values.
arXiv Detail & Related papers (2025-07-15T21:02:35Z)
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models [11.080727606381524]
We introduce the Does-it/Can-it framework, DIM-CIM, a reference-free measurement of default-mode diversity.<n>We find that widely-used models improve in generalization at the cost of default-mode diversity when scaling from 1.5B to 8.1B parameters.<n>We also use DIMCIM to evaluate the training data of a T2I model and observe a correlation of 0.85 between diversity in training images and default-mode diversity.
arXiv Detail & Related papers (2025-06-05T14:53:34Z)
Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z)
Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting [9.116108409344177]
The source-free cross-domain few-shot learning task aims to transfer pretrained models to target domains utilizing minimal samples.<n>We propose the SeGD-VPT framework, which is divided into two phases.<n>The first step aims to increase feature diversity by adding diversity prompts to each support sample, thereby generating varying input and enhancing sample diversity.
arXiv Detail & Related papers (2024-12-01T11:00:38Z)
GRADE: Quantifying Sample Diversity in Text-to-Image Models [66.12068246962762]
GRADE is an automatic method for quantifying sample diversity in text-to-image models.<n>We use GRADE to measure the diversity of 12 models over a total of 720K images.
arXiv Detail & Related papers (2024-10-29T23:10:28Z)
Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation. Our approach can be applied to existing datasets by automatically generating hard negative test captions. Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z)
Minority-Focused Text-to-Image Generation via Prompt Optimization [57.319845580050924]
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models.<n>We develop an online prompt optimization framework that encourages emergence of desired properties during inference.<n>We then tailor this generic prompt distributions into a specialized solver that promotes generation of minority features.
arXiv Detail & Related papers (2024-10-10T11:56:09Z)
DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World [6.816521410643928]
This paper proposes a new general method, named Diversity Adaptive Test-Time Adaptation (DATTA), aimed at improving Quality of Experience (QoE) It features three key components: Diversity Discrimination (DD) to assess batch diversity, Diversity Adaptive Batch Normalization (DABN) to tailor normalization methods based on DD insights, and Diversity Adaptive Fine-Tuning (DAFT) to selectively fine-tune the model. Experimental results show that our method achieves up to a 21% increase in accuracy compared to state-of-the-art methodologies.
arXiv Detail & Related papers (2024-08-15T09:50:11Z)
Discriminative Probing and Tuning for Text-to-Image Generation [129.39674951747412]
Text-to-image generation (T2I) often faces text-image misalignment problems such as relation confusion in generated images. We propose bolstering the discriminative abilities of T2I models to achieve more precise text-to-image alignment for generation. We present a discriminative adapter built on T2I models to probe their discriminative abilities on two representative tasks and leverage discriminative fine-tuning to improve their text-image alignment.
arXiv Detail & Related papers (2024-03-07T08:37:33Z)
Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN) We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z)
IFDID: Information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG [5.771099867942164]
This paper presents Information Filter upon Diversity-Improved Decoding (IFDID) to obtain the tradeoff between diversity and faithfulness. Our approach achieves a 1.24 higher ROUGE score describing faithfulness as well as higher diversity represented by 62.5% higher upon Dist-2 than traditional approaches.
arXiv Detail & Related papers (2022-10-25T08:14:20Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.