Related papers: Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models

Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models

URL: http://arxiv.org/abs/2206.11484v2
Date: Fri, 8 Jul 2022 02:09:28 GMT
Title: Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models
Authors: Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May
Abstract summary: This paper presents exploratory work on whether and to what extent biases against queer and trans people are encoded in large language models (LLMs) such as BERT. To measure anti-queer bias, we introduce a new benchmark dataset, WinoQueer, modeled after other bias-detection benchmarks but addressing homophobic and transphobic biases. We found that BERT shows significant homophobic bias, but this bias can be mostly mitigated by finetuning BERT on a natural language corpus written by members of the LGBTQ+ community.
Score: 18.922402889762488
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents exploratory work on whether and to what extent biases against queer and trans people are encoded in large language models (LLMs) such as BERT. We also propose a method for reducing these biases in downstream tasks: finetuning the models on data written by and/or about queer people. To measure anti-queer bias, we introduce a new benchmark dataset, WinoQueer, modeled after other bias-detection benchmarks but addressing homophobic and transphobic biases. We found that BERT shows significant homophobic bias, but this bias can be mostly mitigated by finetuning BERT on a natural language corpus written by members of the LGBTQ+ community.

Related papers

What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond? We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations. We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z)
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z)
WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models [29.773734878738264]
WinoQueer is a benchmark designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community. We apply our benchmark to several popular LLMs and find that off-the-shelf models generally do exhibit considerable anti-queer bias.
arXiv Detail & Related papers (2023-06-26T22:07:33Z)
Keeping Up with the Language Models: Systematic Benchmark Extension for Bias Auditing [33.25539075550122]
We extend an existing bias benchmark for NLI using a combination of LM-generated lexical variations, adversarial filtering, and human validation. We show that BBNLI-next reduces the accuracy of state-of-the-art NLI models from 95.3% to a strikingly low 57.5%. We propose bias measures that take into account both bias and model brittleness.
arXiv Detail & Related papers (2023-05-22T01:02:45Z)
Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models. We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation. We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z)
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias [33.99768156365231]
We introduce a causal formulation for bias measurement in generative language models. We propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. The results show that these models exhibit substantial occupational gender bias.
arXiv Detail & Related papers (2022-12-20T22:41:24Z)
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks [75.58692290694452]
We compare social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. We observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models.
arXiv Detail & Related papers (2022-10-18T17:58:39Z)
The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings. We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z)
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs) We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)
Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models [5.378664454650768]
This paper proposes two intuitive metrics, skew and stereotype, that quantify and analyse the gender bias present in contextual language models. We find evidence that gender stereotype correlates approximately negatively with gender skew in out-of-the-box models, suggesting that there is a trade-off between these two forms of bias.
arXiv Detail & Related papers (2021-01-24T10:57:59Z)
Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias [12.4543414590979]
Contextualized word embeddings have been replacing standard embeddings in NLP systems. We measure gender bias by studying associations between gender-denoting target words and names of professions in English and German. We show that our method of measuring bias is appropriate for languages with a rich and gender-marking, such as German.
arXiv Detail & Related papers (2020-10-27T18:06:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.