Related papers: A Framework for Benchmarking Fairness-Utility Trade-offs in Text-to-Image Models via Pareto Frontiers

A Framework for Benchmarking Fairness-Utility Trade-offs in Text-to-Image Models via Pareto Frontiers

URL: http://arxiv.org/abs/2508.16752v1
Date: Fri, 22 Aug 2025 19:09:22 GMT
Title: A Framework for Benchmarking Fairness-Utility Trade-offs in Text-to-Image Models via Pareto Frontiers
Authors: Marco N. Bochernitsan, Rodrigo C. Barros, Lucas S. Kupssinskü,
Abstract summary: We propose a method for evaluating fairness and utility in text-to-image models.<n>Our method outlines all configurations that optimize fairness for a given utility and vice-versa.
Score: 1.6516902135723865
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving fairness in text-to-image generation demands mitigating social biases without compromising visual fidelity, a challenge critical to responsible AI. Current fairness evaluation procedures for text-to-image models rely on qualitative judgment or narrow comparisons, which limit the capacity to assess both fairness and utility in these models and prevent reproducible assessment of debiasing methods. Existing approaches typically employ ad-hoc, human-centered visual inspections that are both error-prone and difficult to replicate. We propose a method for evaluating fairness and utility in text-to-image models using Pareto-optimal frontiers across hyperparametrization of debiasing methods. Our method allows for comparison between distinct text-to-image models, outlining all configurations that optimize fairness for a given utility and vice-versa. To illustrate our evaluation method, we use Normalized Shannon Entropy and ClipScore for fairness and utility evaluation, respectively. We assess fairness and utility in Stable Diffusion, Fair Diffusion, SDXL, DeCoDi, and FLUX text-to-image models. Our method shows that most default hyperparameterizations of the text-to-image model are dominated solutions in the fairness-utility space, and it is straightforward to find better hyperparameters.

Related papers

Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images [0.0]
We present a framework for assessing the realism of synthetic image-editing methods.<n>Using 40 clear-day images, we compare rule-based augmentation libraries with generative AI image-editing models.<n>Generative AI methods substantially outperform rule-based approaches, with the best generative method achieving approximately 3.6 times the acceptance rate of the best rule-based method.
arXiv Detail & Related papers (2026-03-04T17:46:08Z)
On the use of graph models to achieve individual and group fairness [0.6299766708197883]
We provide a theoretical framework based on Sheaf Diffusion to leverage tools based on dynamical systems and homology to model fairness.<n>We present a collection of network topologies handling different fairness metrics, leading to a unified method capable of dealing with both individual and group bias.<n>The paper showcases the performance of the proposed models in terms of accuracy and fairness.
arXiv Detail & Related papers (2026-01-13T18:17:43Z)
FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models [10.857020427374506]
We introduce FairImagen, a post-hoc debiasing framework that operates on prompt embeddings to mitigate societal biases.<n>Our framework outperforms existing post-hoc methods and offers a simple, scalable, and model-agnostic solution for equitable text-to-image generation.
arXiv Detail & Related papers (2025-10-24T11:47:15Z)
A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation.<n> Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity.<n>This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
CTSR: Controllable Fidelity-Realness Trade-off Distillation for Real-World Image Super Resolution [52.93785843453579]
Real-world image super-resolution is a critical image processing task, where two key evaluation criteria are the fidelity to the original image and the visual realness of the generated results.<n>We propose a distillation-based approach that leverages the geometric decomposition of both fidelity and realness, alongside the performance advantages of multiple teacher models.<n> Experiments conducted on several real-world image super-resolution benchmarks demonstrate that our method surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-18T14:06:39Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Bias Begets Bias: The Impact of Biased Embeddings on Diffusion Models [0.0]
Text-to-Image (TTI) systems have come under increased scrutiny for social biases. We investigate embedding spaces as a source of bias for TTI models. We find that biased multimodal embeddings like CLIP can result in lower alignment scores for representationally balanced TTI models.
arXiv Detail & Related papers (2024-09-15T01:09:55Z)
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models [85.96013373385057]
Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent. However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models. We propose TextNorm, a method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts.
arXiv Detail & Related papers (2024-04-02T11:40:38Z)
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z)
Benchmarking the Fairness of Image Upsampling Methods [29.01986714656294]
We develop a set of metrics for performance and fairness of conditional generative models. We benchmark their imbalances and diversity. As part of the study, a subset of datasets replicates the racial distribution of common-scale face.
arXiv Detail & Related papers (2024-01-24T16:13:26Z)
FairGridSearch: A Framework to Compare Fairness-Enhancing Models [0.0]
This paper focuses on binary classification and proposes FairGridSearch, a novel framework for comparing fairness-enhancing models. The study applies FairGridSearch to three popular datasets (Adult, COMPAS, and German Credit) and analyzes the impacts of metric selection, base estimator choice, and classification threshold on model fairness.
arXiv Detail & Related papers (2024-01-04T10:29:02Z)
Fair Text-to-Image Diffusion via Fair Mapping [32.02815667307623]
We propose a flexible, model-agnostic, and lightweight approach that modifies a pre-trained text-to-image diffusion model. By effectively addressing the issue of implicit language bias, our method produces more fair and diverse image outputs.
arXiv Detail & Related papers (2023-11-29T15:02:01Z)
Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z)
Paired Image-to-Image Translation Quality Assessment Using Multi-Method Fusion [0.0]
This paper proposes a novel approach that combines signals of image quality between paired source and transformation to predict the latter's similarity with a hypothetical ground truth. We trained a Multi-Method Fusion (MMF) model via an ensemble of gradient-boosted regressors to predict Deep Image Structure and Texture Similarity (DISTS) Analysis revealed the task to be feature-constrained, introducing a trade-off at inference between metric time and prediction accuracy.
arXiv Detail & Related papers (2022-05-09T11:05:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.