Related papers: Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

URL: http://arxiv.org/abs/2310.17586v1
Date: Thu, 26 Oct 2023 17:07:50 GMT
Title: Global Voices, Local Biases: Socio-Cultural Prejudices across Languages
Authors: Anjishnu Mukherjee, Chahat Raj, Ziwei Zhu, Antonios Anastasopoulos
Abstract summary: Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders. In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies. To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
Score: 22.92083941222383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human biases are ubiquitous but not uniform: disparities exist across linguistic, cultural, and societal borders. As large amounts of recent literature suggest, language models (LMs) trained on human data can reflect and often amplify the effects of these social biases. However, the vast majority of existing studies on bias are heavily skewed towards Western and European languages. In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies and yielding interesting findings about LM bias. We additionally enhance this data with culturally relevant information for each language, capturing local contexts on a global scale. Further, to encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more. Moreover, we delve deeper into the Indian linguistic landscape, conducting a comprehensive regional bias analysis across six prevalent Indian languages. Finally, we highlight the significance of these social biases and the new dimensions through an extensive comparison of embedding methods, reinforcing the need to address them in pursuit of more equitable language models. All code, data and results are available here: https://github.com/iamshnoo/weathub.

Related papers

Measuring South Asian Biases in Large Language Models [1.5903891569492878]
This work addresses gaps by conducting a multilingual and intersectional analysis of Large Language Models (LLMs)<n>We construct a culturally grounded bias lexicon capturing previously unexplored intersectional dimensions including gender, religion, marital status, and number of children.<n>We evaluate two self-debiasing strategies to measure their effectiveness in reducing culturally specific bias in Indo-Aryan and Dravidian languages.
arXiv Detail & Related papers (2025-05-24T02:18:17Z)
Scaling for Fairness? Analyzing Model Size, Data Composition, and Multilinguality in Vision-Language Bias [14.632649933582648]
We investigate how dataset composition, model size, and multilingual training affect gender and racial bias in a popular VLM, CLIP, and its open source variants. To assess social perception bias, we measure the zero-shot performance on face images featuring socially charged terms.
arXiv Detail & Related papers (2025-01-22T21:08:30Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency. We introduce the novel Language Agency Bias Evaluation benchmark. We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Evaluating Biased Attitude Associations of Language Models in an Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z)
Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP. We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z)
Comparing Biases and the Impact of Multilingual Training across Multiple Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z)
An Analysis of Social Biases Present in BERT Variants Across Multiple Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages. We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood. We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z)
Socially Aware Bias Measurements for Hindi Language Representations [38.40818373580979]
We show how biases are unique to specific language representations based on the history and culture of the region they are widely spoken in. We emphasize on the necessity of social-awareness along with linguistic and grammatical artefacts when modeling language representations.
arXiv Detail & Related papers (2021-10-15T05:49:15Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
Discovering and Categorising Language Biases in Reddit [5.670038395203354]
This paper proposes a data-driven approach to automatically discover language biases encoded in the vocabulary of online discourse communities on Reddit. We use word embeddings to transform text into high-dimensional dense vectors and capture semantic relations between words. We successfully discover gender bias, religion bias, and ethnic bias in different Reddit communities.
arXiv Detail & Related papers (2020-08-06T16:42:10Z)
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.