Language-Agnostic Bias Detection in Language Models with Bias Probing
- URL: http://arxiv.org/abs/2305.13302v2
- Date: Mon, 20 Nov 2023 14:31:26 GMT
- Title: Language-Agnostic Bias Detection in Language Models with Bias Probing
- Authors: Abdullatif K\"oksal, Omer Faruk Yalcin, Ahmet Akbiyik, M. Tahir
Kilavuz, Anna Korhonen, Hinrich Sch\"utze
- Abstract summary: Pretrained language models (PLMs) are key components in NLP, but they contain strong social biases.
We propose a bias probing technique called LABDet for evaluating social bias in PLMs with a robust and language-agnostic method.
We find consistent patterns of nationality bias across monolingual PLMs in six languages that align with historical and political context.
- Score: 22.695872707061078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained language models (PLMs) are key components in NLP, but they contain
strong social biases. Quantifying these biases is challenging because current
methods focusing on fill-the-mask objectives are sensitive to slight changes in
input. To address this, we propose a bias probing technique called LABDet, for
evaluating social bias in PLMs with a robust and language-agnostic method. For
nationality as a case study, we show that LABDet `surfaces' nationality bias by
training a classifier on top of a frozen PLM on non-nationality sentiment
detection. We find consistent patterns of nationality bias across monolingual
PLMs in six languages that align with historical and political context. We also
show for English BERT that bias surfaced by LABDet correlates well with bias in
the pretraining data; thus, our work is one of the few studies that directly
links pretraining data to PLM behavior. Finally, we verify LABDet's reliability
and applicability to different templates and languages through an extensive set
of robustness checks. We publicly share our code and dataset in
https://github.com/akoksal/LABDet.
Related papers
- A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia [0.3376269351435396]
We propose a novel metric to measure token-level contributions to biased behavior in pretrained language models (PLMs)
Our results confirm the presence of sexist and homophobic bias in Southeast Asian PLMs.
Interpretability and semantic analyses also reveal that PLM bias is strongly induced by words relating to crime, intimate relationships, and helping.
arXiv Detail & Related papers (2024-10-20T18:31:05Z) - BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns.
This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text.
By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs [6.781972039785424]
Generative large language models (LLMs) have been shown to exhibit harmful biases and stereotypes.
We present MBBQ, a dataset that measures stereotypes commonly held across Dutch, Spanish, and Turkish languages.
Our results confirm that some non-English languages suffer from bias more than English, even when controlling for cultural shifts.
arXiv Detail & Related papers (2024-06-11T13:23:14Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Gender Bias in Masked Language Models for Multiple Languages [31.528949172210233]
We propose Bias Evaluation (MBE) score, to evaluate bias in various languages using only English attribute word lists and parallel corpora.
We evaluate bias in eight languages using the MBE and confirmed that gender-related biases are encoded in attribute words for all those languages.
arXiv Detail & Related papers (2022-05-01T20:19:14Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender
Bias [12.4543414590979]
Contextualized word embeddings have been replacing standard embeddings in NLP systems.
We measure gender bias by studying associations between gender-denoting target words and names of professions in English and German.
We show that our method of measuring bias is appropriate for languages with a rich and gender-marking, such as German.
arXiv Detail & Related papers (2020-10-27T18:06:09Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.