Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis
- URL: http://arxiv.org/abs/2305.12709v1
- Date: Mon, 22 May 2023 04:37:49 GMT
- Title: Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis
- Authors: Seraphina Goldfarb-Tarrant, Bj\"orn Ross, Adam Lopez
- Abstract summary: We study whether gender or racial biases are imported when using cross-lingual transfer.
We find that systems using cross-lingual transfer usually become more biased than their monolingual counterparts.
We also find racial biases to be much more prevalent than gender biases.
- Score: 12.767209085664247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentiment analysis (SA) systems are widely deployed in many of the world's
languages, and there is well-documented evidence of demographic bias in these
systems. In languages beyond English, scarcer training data is often
supplemented with transfer learning using pre-trained models, including
multilingual models trained on other languages. In some cases, even supervision
data comes from other languages. Does cross-lingual transfer also import new
biases? To answer this question, we use counterfactual evaluation to test
whether gender or racial biases are imported when using cross-lingual transfer,
compared to a monolingual transfer setting. Across five languages, we find that
systems using cross-lingual transfer usually become more biased than their
monolingual counterparts. We also find racial biases to be much more prevalent
than gender biases. To spur further research on this topic, we release the
sentiment models we used for this study, and the intermediate checkpoints
throughout training, yielding 1,525 distinct models; we also release our
evaluation code.
Related papers
- Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Bias Beyond English: Counterfactual Tests for Bias in Sentiment Analysis
in Four Languages [13.694445396757162]
Sentiment analysis systems are used in many products and hundreds of languages.
Gender and racial biases are well-studied in English SA systems, but understudied in other languages.
We build a counterfactual evaluation corpus for gender and racial/migrant bias in four languages.
arXiv Detail & Related papers (2023-05-19T13:38:53Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Language Contamination Explains the Cross-lingual Capabilities of
English Pretrained Models [79.38278330678965]
We find that common English pretraining corpora contain significant amounts of non-English text.
This leads to hundreds of millions of foreign language tokens in large-scale datasets.
We then demonstrate that even these small percentages of non-English data facilitate cross-lingual transfer for models trained on them.
arXiv Detail & Related papers (2022-04-17T23:56:54Z) - Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer [39.360667403003745]
Zero-shot cross-lingual transfer is emerging as a practical solution.
English is the dominant source language for transfer, as reinforced by popular zero-shot benchmarks.
We find that other high-resource languages such as German and Russian often transfer more effectively.
arXiv Detail & Related papers (2021-06-30T16:05:57Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.