WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in
Large Language Models
- URL: http://arxiv.org/abs/2306.15087v1
- Date: Mon, 26 Jun 2023 22:07:33 GMT
- Title: WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in
Large Language Models
- Authors: Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May
- Abstract summary: WinoQueer is a benchmark designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community.
We apply our benchmark to several popular LLMs and find that off-the-shelf models generally do exhibit considerable anti-queer bias.
- Score: 18.922402889762488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present WinoQueer: a benchmark specifically designed to measure whether
large language models (LLMs) encode biases that are harmful to the LGBTQ+
community. The benchmark is community-sourced, via application of a novel
method that generates a bias benchmark from a community survey. We apply our
benchmark to several popular LLMs and find that off-the-shelf models generally
do exhibit considerable anti-queer bias. Finally, we show that LLM bias against
a marginalized community can be somewhat mitigated by finetuning on data
written about or by members of that community, and that social media text
written by community members is more effective than news text written about the
community by non-members. Our method for community-in-the-loop benchmark
development provides a blueprint for future researchers to develop
community-driven, harms-grounded LLM benchmarks for other marginalized
communities.
Related papers
- VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
VLBiasBench is a benchmark aimed at evaluating biases in Large Vision-Language Models (LVLMs)
We construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status)
We conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z) - COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities [5.0261645603931475]
Community-Cross-Instruct is an unsupervised framework for aligning large language models to online communities.
It generates instructions in a fully unsupervised manner, enhancing scalability and generalization across domains.
This work enables cost-effective and automated surveying of diverse online communities.
arXiv Detail & Related papers (2024-06-17T20:20:47Z) - MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures [57.886592207948844]
We propose MixEval, a new paradigm for establishing efficient, gold-standard evaluation by strategically mixing off-the-shelf benchmarks.
It bridges (1) comprehensive and well-distributed real-world user queries and (2) efficient and fairly-graded ground-truth-based benchmarks, by matching queries mined from the web with similar queries from existing benchmarks.
arXiv Detail & Related papers (2024-06-03T05:47:05Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Queer People are People First: Deconstructing Sexual Identity
Stereotypes in Large Language Models [3.974379576408554]
Large Language Models (LLMs) are trained primarily on minimally processed web text.
LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community.
arXiv Detail & Related papers (2023-06-30T19:39:01Z) - G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment [64.01972723692587]
We present G-Eval, a framework of using large language models with chain-of-thoughts (CoT) and a form-filling paradigm to assess the quality of NLG outputs.
We show that G-Eval with GPT-4 as the backbone model achieves a Spearman correlation of 0.514 with human on summarization task, outperforming all previous methods by a large margin.
arXiv Detail & Related papers (2023-03-29T12:46:54Z) - BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for
Text Generation [89.41378346080603]
This work presents the first systematic study on the social bias in PLM-based metrics.
We demonstrate that popular PLM-based metrics exhibit significantly higher social bias than traditional metrics on 6 sensitive attributes.
In addition, we develop debiasing adapters that are injected into PLM layers, mitigating bias in PLM-based metrics while retaining high performance for evaluating text generation.
arXiv Detail & Related papers (2022-10-14T08:24:11Z) - A Keyword Based Approach to Understanding the Overpenalization of
Marginalized Groups by English Marginal Abuse Models on Twitter [2.9604738405097333]
Harmful content detection models tend to have higher false positive rates for content from marginalized groups.
We propose a principled approach to detecting and measuring the severity of potential harms associated with a text-based model.
We apply our methodology to audit Twitter's English marginal abuse model, which is used for removing amplification eligibility of marginally abusive content.
arXiv Detail & Related papers (2022-10-07T20:28:00Z) - Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large
Language Models [18.922402889762488]
This paper presents exploratory work on whether and to what extent biases against queer and trans people are encoded in large language models (LLMs) such as BERT.
To measure anti-queer bias, we introduce a new benchmark dataset, WinoQueer, modeled after other bias-detection benchmarks but addressing homophobic and transphobic biases.
We found that BERT shows significant homophobic bias, but this bias can be mostly mitigated by finetuning BERT on a natural language corpus written by members of the LGBTQ+ community.
arXiv Detail & Related papers (2022-06-23T05:30:47Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.