Investigating Bias in Multilingual Language Models: Cross-Lingual
Transfer of Debiasing Techniques
- URL: http://arxiv.org/abs/2310.10310v1
- Date: Mon, 16 Oct 2023 11:43:30 GMT
- Title: Investigating Bias in Multilingual Language Models: Cross-Lingual
Transfer of Debiasing Techniques
- Authors: Manon Reusens, Philipp Borchert, Margot Mieskes, Jochen De Weerdt,
Bart Baesens
- Abstract summary: Cross-lingual transfer of debiasing techniques is not only feasible but also yields promising results.
Using translations of the CrowS-Pairs dataset, our analysis identifies SentenceDebias as the best technique across different languages.
- Score: 3.9673530817103333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the transferability of debiasing techniques across
different languages within multilingual models. We examine the applicability of
these techniques in English, French, German, and Dutch. Using multilingual BERT
(mBERT), we demonstrate that cross-lingual transfer of debiasing techniques is
not only feasible but also yields promising results. Surprisingly, our findings
reveal no performance disadvantages when applying these techniques to
non-English languages. Using translations of the CrowS-Pairs dataset, our
analysis identifies SentenceDebias as the best technique across different
languages, reducing bias in mBERT by an average of 13%. We also find that
debiasing techniques with additional pretraining exhibit enhanced cross-lingual
effectiveness for the languages included in the analyses, particularly in
lower-resource languages. These novel insights contribute to a deeper
understanding of bias mitigation in multilingual language models and provide
practical guidance for debiasing techniques in different language contexts.
Related papers
- A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification [1.566834021297545]
This study systematically evaluates translation bias and the effectiveness of Large Language Models for cross-lingual claim verification.
We investigate two distinct translation methods: pre-translation and self-translation.
Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation.
arXiv Detail & Related papers (2024-10-14T09:02:42Z) - On Evaluating and Mitigating Gender Biases in Multilingual Settings [5.248564173595024]
We investigate some of the challenges with evaluating and mitigating biases in multilingual settings.
We first create a benchmark for evaluating gender biases in pre-trained masked language models.
We extend various debiasing methods to work beyond English and evaluate their effectiveness for SOTA massively multilingual models.
arXiv Detail & Related papers (2023-07-04T06:23:04Z) - Cross-Lingual Transfer Learning for Phrase Break Prediction with
Multilingual Language Model [13.730152819942445]
Cross-lingual transfer learning can be particularly effective for improving performance in low-resource languages.
This suggests that cross-lingual transfer can be inexpensive and effective for developing TTS front-end in resource-poor languages.
arXiv Detail & Related papers (2023-06-05T04:10:04Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Data-adaptive Transfer Learning for Translation: A Case Study in Haitian
and Jamaican [4.4096464238164295]
We show that transfer effectiveness is correlated with amount of training data and relationships between languages.
We contribute a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding.
In very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.
arXiv Detail & Related papers (2022-09-13T20:58:46Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.