Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large
Language Models
- URL: http://arxiv.org/abs/2206.11484v2
- Date: Fri, 8 Jul 2022 02:09:28 GMT
- Title: Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large
Language Models
- Authors: Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May
- Abstract summary: This paper presents exploratory work on whether and to what extent biases against queer and trans people are encoded in large language models (LLMs) such as BERT.
To measure anti-queer bias, we introduce a new benchmark dataset, WinoQueer, modeled after other bias-detection benchmarks but addressing homophobic and transphobic biases.
We found that BERT shows significant homophobic bias, but this bias can be mostly mitigated by finetuning BERT on a natural language corpus written by members of the LGBTQ+ community.
- Score: 18.922402889762488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents exploratory work on whether and to what extent biases
against queer and trans people are encoded in large language models (LLMs) such
as BERT. We also propose a method for reducing these biases in downstream
tasks: finetuning the models on data written by and/or about queer people. To
measure anti-queer bias, we introduce a new benchmark dataset, WinoQueer,
modeled after other bias-detection benchmarks but addressing homophobic and
transphobic biases. We found that BERT shows significant homophobic bias, but
this bias can be mostly mitigated by finetuning BERT on a natural language
corpus written by members of the LGBTQ+ community.
Related papers
- MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias [23.10522891268232]
We show that this bias is already present in the text encoder of the model.
We propose MoESD (Mixture of Experts Stable Diffusion) with BiAs (Bias Adapters) to mitigate gender bias.
arXiv Detail & Related papers (2024-06-25T14:59:31Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models.
The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control.
Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z) - WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in
Large Language Models [18.922402889762488]
WinoQueer is a benchmark designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community.
We apply our benchmark to several popular LLMs and find that off-the-shelf models generally do exhibit considerable anti-queer bias.
arXiv Detail & Related papers (2023-06-26T22:07:33Z) - Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models.
We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation.
We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z) - Testing Occupational Gender Bias in Language Models: Towards Robust Measurement and Zero-Shot Debiasing [98.07536837448293]
Large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics.
We introduce a list of desiderata for robustly measuring biases in generative language models.
We then use this benchmark to test several state-of-the-art open-source LLMs, including Llama, Mistral, and their instruction-tuned versions.
arXiv Detail & Related papers (2022-12-20T22:41:24Z) - The Tail Wagging the Dog: Dataset Construction Biases of Social Bias
Benchmarks [75.58692290694452]
We compare social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye.
We observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models.
arXiv Detail & Related papers (2022-10-18T17:58:39Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z) - Stereotype and Skew: Quantifying Gender Bias in Pre-trained and
Fine-tuned Language Models [5.378664454650768]
This paper proposes two intuitive metrics, skew and stereotype, that quantify and analyse the gender bias present in contextual language models.
We find evidence that gender stereotype correlates approximately negatively with gender skew in out-of-the-box models, suggesting that there is a trade-off between these two forms of bias.
arXiv Detail & Related papers (2021-01-24T10:57:59Z) - Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender
Bias [12.4543414590979]
Contextualized word embeddings have been replacing standard embeddings in NLP systems.
We measure gender bias by studying associations between gender-denoting target words and names of professions in English and German.
We show that our method of measuring bias is appropriate for languages with a rich and gender-marking, such as German.
arXiv Detail & Related papers (2020-10-27T18:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.