Comparing Biases and the Impact of Multilingual Training across Multiple
Languages
- URL: http://arxiv.org/abs/2305.11242v1
- Date: Thu, 18 May 2023 18:15:07 GMT
- Title: Comparing Biases and the Impact of Multilingual Training across Multiple
Languages
- Authors: Sharon Levy, Neha Anna John, Ling Liu, Yogarshi Vyas, Jie Ma,
Yoshinari Fujinuma, Miguel Ballesteros, Vittorio Castelli, Dan Roth
- Abstract summary: We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
- Score: 70.84047257764405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Studies in bias and fairness in natural language processing have primarily
examined social biases within a single language and/or across few attributes
(e.g. gender, race). However, biases can manifest differently across various
languages for individual attributes. As a result, it is critical to examine
biases within each language and attribute. Of equal importance is to study how
these biases compare across languages and how the biases are affected when
training a model on multilingual data versus monolingual data. We present a
bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the
downstream sentiment analysis task to observe whether specific demographics are
viewed more positively. We study bias similarities and differences across these
languages and investigate the impact of multilingual vs. monolingual training
data. We adapt existing sentiment bias templates in English to Italian,
Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality,
and gender. Our results reveal similarities in bias expression such as
favoritism of groups that are dominant in each language's culture (e.g.
majority religions and nationalities). Additionally, we find an increased
variation in predictions across protected groups, indicating bias
amplification, after multilingual finetuning in comparison to multilingual
pretraining.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders.
In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies.
To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z) - Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis [12.767209085664247]
We study whether gender or racial biases are imported when using cross-lingual transfer.
We find that systems using cross-lingual transfer usually become more biased than their monolingual counterparts.
We also find racial biases to be much more prevalent than gender biases.
arXiv Detail & Related papers (2023-05-22T04:37:49Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - Mitigating Language-Dependent Ethnic Bias in BERT [11.977810781738603]
We study ethnic bias and how it varies across languages by analyzing and mitigating ethnic bias in monolingual BERT.
To observe and quantify ethnic bias, we develop a novel metric called Categorical Bias score.
We propose two methods for mitigation; first using a multilingual model, and second using contextual word alignment of two monolingual models.
arXiv Detail & Related papers (2021-09-13T04:52:41Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.