From Perceived Effectiveness to Measured Impact: Identity-Aware Evaluation of Automated Counter-Stereotypes
- URL: http://arxiv.org/abs/2510.23523v1
- Date: Mon, 27 Oct 2025 17:02:04 GMT
- Title: From Perceived Effectiveness to Measured Impact: Identity-Aware Evaluation of Automated Counter-Stereotypes
- Authors: Svetlana Kiritchenko, Anna Kerkhof, Isar Nejadgholi, Kathleen C. Fraser,
- Abstract summary: We investigate the effect of automatically generated counter-stereotypes on gender bias held by users of various demographics on social media.<n>We evaluate two counter-stereotype strategies -- counter-facts and broadening universals -- which have been identified as the most potentially effective in previous studies.<n>Our findings reveal that actual effectiveness does not align with perceived effectiveness, and the former is a nuanced and sometimes divergent phenomenon across demographic groups.
- Score: 16.83414091095414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the effect of automatically generated counter-stereotypes on gender bias held by users of various demographics on social media. Building on recent NLP advancements and social psychology literature, we evaluate two counter-stereotype strategies -- counter-facts and broadening universals (i.e., stating that anyone can have a trait regardless of group membership) -- which have been identified as the most potentially effective in previous studies. We assess the real-world impact of these strategies on mitigating gender bias across user demographics (gender and age), through the Implicit Association Test and the self-reported measures of explicit bias and perceived utility. Our findings reveal that actual effectiveness does not align with perceived effectiveness, and the former is a nuanced and sometimes divergent phenomenon across demographic groups. While overall bias reduction was limited, certain groups (e.g., older, male participants) exhibited measurable improvements in implicit bias in response to some interventions. Conversely, younger participants, especially women, showed increasing bias in response to the same interventions. These results highlight the complex and identity-sensitive nature of stereotype mitigation and call for dynamic and context-aware evaluation and mitigation strategies.
Related papers
- How Quantization Shapes Bias in Large Language Models [61.40435736418359]
We focus on weight and activation quantization strategies and examine their effects across a broad range of bias types.<n>We employ both probabilistic and generated text-based metrics across nine benchmarks and evaluate models varying in architecture family and reasoning ability.
arXiv Detail & Related papers (2025-08-25T14:48:26Z) - EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition [49.27067541740956]
EMO-Debias is a large-scale comparison of 13 debiasing methods applied to multi-label SER.<n>Our study encompasses techniques from pre-processing, regularization, adversarial learning, biased learners, and distributionally robust optimization.<n>Our analysis quantifies the trade-offs between fairness and accuracy, identifying which approaches consistently reduce gender performance gaps.
arXiv Detail & Related papers (2025-06-05T05:48:31Z) - Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models [66.5536396328527]
LLMs inadvertently absorb spurious correlations from training data, leading to stereotype associations between biased concepts and specific social groups.<n>We propose Fairness Mediator (FairMed), a bias mitigation framework that neutralizes stereotype associations.<n>Our framework comprises two main components: a stereotype association prober and an adversarial debiasing neutralizer.
arXiv Detail & Related papers (2025-04-10T14:23:06Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [91.86718720024825]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.<n>Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.<n>We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - Challenging Negative Gender Stereotypes: A Study on the Effectiveness of Automated Counter-Stereotypes [12.704072523930444]
This study investigates eleven strategies to automatically counter-act and challenge gender stereotypes in online communications.
We present AI-generated gender-based counter-stereotypes to study participants and ask them to assess their offensiveness, plausibility, and potential effectiveness.
arXiv Detail & Related papers (2024-04-18T01:48:28Z) - Explaining Knock-on Effects of Bias Mitigation [13.46387356280467]
In machine learning systems, bias mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups.
In this paper, we aim to characterise impacted cohorts when mitigation interventions are applied.
We examine a range of bias mitigation strategies that work at various stages of the model life cycle.
We show that all tested mitigation strategies negatively impact a non-trivial fraction of cases, i.e., people who receive unfavourable outcomes solely on account of mitigation efforts.
arXiv Detail & Related papers (2023-12-01T18:40:37Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Deep Generative Views to Mitigate Gender Classification Bias Across
Gender-Race Groups [0.8594140167290097]
We propose a bias mitigation strategy to improve classification accuracy and reduce bias across gender-racial groups.
We leverage the power of generative views, structured learning, and evidential learning towards mitigating gender classification bias.
arXiv Detail & Related papers (2022-08-17T16:23:35Z) - Toward Understanding Bias Correlations for Mitigation in NLP [34.956581421295]
This work aims to provide a first systematic study toward understanding bias correlations in mitigation.
We examine bias mitigation in two common NLP tasks -- toxicity detection and word embeddings.
Our findings suggest that biases are correlated and present scenarios in which independent debiasing approaches may be insufficient.
arXiv Detail & Related papers (2022-05-24T22:48:47Z) - Mitigating Face Recognition Bias via Group Adaptive Classifier [53.15616844833305]
This work aims to learn a fair face representation, where faces of every group could be more equally represented.
Our work is able to mitigate face recognition bias across demographic groups while maintaining the competitive accuracy.
arXiv Detail & Related papers (2020-06-13T06:43:37Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.