Investigating Disability Representations in Text-to-Image Models
- URL: http://arxiv.org/abs/2602.04687v2
- Date: Thu, 05 Feb 2026 19:37:18 GMT
- Title: Investigating Disability Representations in Text-to-Image Models
- Authors: Yang Yian, Yu Fan, Liudmila Zavolokina, Sarah Ebling,
- Abstract summary: This study investigates how people with disabilities are represented in AI-generated images.<n>We analyze disability representations by comparing image similarities between generic disability prompts and prompts referring to specific disability categories.
- Score: 7.244686394468418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generative models have made remarkable progress in producing high-quality visual content from textual descriptions, yet concerns remain about how they represent social groups. While characteristics like gender and race have received increasing attention, disability representations remain underexplored. This study investigates how people with disabilities are represented in AI-generated images by analyzing outputs from Stable Diffusion XL and DALL-E 3 using a structured prompt design. We analyze disability representations by comparing image similarities between generic disability prompts and prompts referring to specific disability categories. Moreover, we evaluate how mitigation strategies influence disability portrayals, with a focus on assessing affective framing through sentiment polarity analysis, combining both automatic and human evaluation. Our findings reveal persistent representational imbalances and highlight the need for continuous evaluation and refinement of generative models to foster more diverse and inclusive portrayals of disability.
Related papers
- Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities? [61.533560295383786]
Unified Multimodal Large Language Models (U-MLLMs) integrate understanding and generation within a single architecture.<n>We observe that U-MLLMs fail to maintain semantic equivalence when required to render the same results in the image modality.<n>We introduce VGUBench, a framework to decouple reasoning logic from generation fidelity.
arXiv Detail & Related papers (2026-02-27T06:23:56Z) - Auditing Disability Representation in Vision-Language Models [0.6987503477818553]
We study disability aware descriptions for person centric images.<n>We introduce a benchmark based on paired Neutral Prompts (NP) and Disability-Contextualised Prompts (DP)<n>We evaluate 15 state-of-the-art open- and closed-source vision-language models under a zero-shot setting across 9 disability categories.
arXiv Detail & Related papers (2026-01-24T07:25:43Z) - Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos [79.03150233804458]
Real-world images entangle race and gender with correlated factors such as background and clothing, obscuring attribution.<n>We propose a textbfface-only counterfactual evaluation paradigm<n>We generate counterfactual variants by editing only facial attributes related to race and gender, keeping all other visual factors fixed.
arXiv Detail & Related papers (2026-01-11T14:35:06Z) - Who's Asking? Investigating Bias Through the Lens of Disability Framed Queries in LLMs [2.722784054643991]
Large Language Models (LLMs) routinely infer users demographic traits from phrasing alone.<n>Disability cues in shaping these inferences remains largely uncharted.<n>We present the first systematic audit of disability-conditioned demographic bias across eight state-of-the-art instruction-tuned LLMs.
arXiv Detail & Related papers (2025-08-18T21:03:09Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Disability Representations: Finding Biases in Automatic Image Generation [0.0]
This study investigates the representation biases in popular image generation models towards people with disabilities (PWD)
The results indicate a significant bias, with most generated images portraying disabled individuals as old, sad, and predominantly using manual wheelchairs.
These findings highlight the urgent need for more inclusive AI development, ensuring diverse and accurate representation of PWD in generated images.
arXiv Detail & Related papers (2024-06-21T09:12:31Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Visual Perturbation-aware Collaborative Learning for Overcoming the
Language Prior Problem [60.0878532426877]
We propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration.
Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents.
The experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness.
arXiv Detail & Related papers (2022-07-24T23:50:52Z) - Describing image focused in cognitive and visual details for visually
impaired people: An approach to generating inclusive paragraphs [2.362412515574206]
There is a lack of services that support specific tasks, such as understanding the image context presented in online content, e.g., webinars.
We propose an approach for generating context of webinar images combining a dense captioning technique with a set of filters, to fit the captions in our domain, and a language model for the abstractive summary task.
arXiv Detail & Related papers (2022-02-10T21:20:53Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.