Do Multilingual Large Language Models Mitigate Stereotype Bias?
- URL: http://arxiv.org/abs/2407.05740v2
- Date: Tue, 9 Jul 2024 08:16:24 GMT
- Title: Do Multilingual Large Language Models Mitigate Stereotype Bias?
- Authors: Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Afsan Mowmita, Nicolas Flores-Herr, Mehdi Ali, Lucie Flek,
- Abstract summary: This study systematically trains six LLMs of identical size and architecture in English, German, French, Italian, and Spanish.
We observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models.
- Score: 9.31741279000585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish) and one multilingual model trained on an equal distribution of data across these languages, all using publicly available data. To ensure robust evaluation, standard bias benchmarks were automatically translated into the five target languages and verified for both translation quality and bias preservation by human annotators. Our results consistently demonstrate that multilingual training effectively mitigates bias. Moreover, we observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models with the same amount of training data, model architecture, and size.
Related papers
- MiLMo:Minority Multilingual Pre-trained Language Model [1.6409017540235764]
This paper constructs a multilingual pre-trained model named MiLMo that performs better on minority language tasks.
By comparing the word2vec model and the pre-trained model in the text classification task, this paper provides an optimal scheme for the downstream task research of minority languages.
arXiv Detail & Related papers (2022-12-04T09:28:17Z) - Cross-lingual Transfer Learning for Check-worthy Claim Identification
over Twitter [7.601937548486356]
Misinformation spread over social media has become an undeniable infodemic.
We present a systematic study of six approaches for cross-lingual check-worthiness estimation across pairs of five diverse languages with the help of Multilingual BERT (mBERT) model.
Our results show that for some language pairs, zero-shot cross-lingual transfer is possible and can perform as good as monolingual models that are trained on the target language.
arXiv Detail & Related papers (2022-11-09T18:18:53Z) - Are Pretrained Multilingual Models Equally Fair Across Languages? [0.0]
This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages.
We evaluate three multilingual models on MozArt -- mBERT, XLM-R, and mT5 -- and show that across the four target languages, the three models exhibit different levels of group disparity.
arXiv Detail & Related papers (2022-10-11T13:59:19Z) - MonoByte: A Pool of Monolingual Byte-level Language Models [4.491765479948667]
We release 10 monolingual byte-level models rigorously pretrained under the same configuration.
Because they are tokenizer-free, the problem of unseen token embeddings is eliminated.
Experiments on QA and NLI tasks show that our monolingual models achieve competitive performance to the multilingual one.
arXiv Detail & Related papers (2022-09-22T14:32:48Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - Language Contamination Explains the Cross-lingual Capabilities of
English Pretrained Models [79.38278330678965]
We find that common English pretraining corpora contain significant amounts of non-English text.
This leads to hundreds of millions of foreign language tokens in large-scale datasets.
We then demonstrate that even these small percentages of non-English data facilitate cross-lingual transfer for models trained on them.
arXiv Detail & Related papers (2022-04-17T23:56:54Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - How Good is Your Tokenizer? On the Monolingual Performance of
Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks.
We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - Structure-Level Knowledge Distillation For Multilingual Sequence
Labeling [73.40368222437912]
We propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models to the unified multilingual model (student)
Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.
arXiv Detail & Related papers (2020-04-08T07:14:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.