Preference Tuning For Toxicity Mitigation Generalizes Across Languages
- URL: http://arxiv.org/abs/2406.16235v1
- Date: Sun, 23 Jun 2024 22:53:47 GMT
- Title: Preference Tuning For Toxicity Mitigation Generalizes Across Languages
- Authors: Xiaochen Li, Zheng-Xin Yong, Stephen H. Bach,
- Abstract summary: This work explores zero-shot cross-lingual generalization of preference tuning in multilingual Large Language Models.
We demonstrate that Direct Preference Optimization training with only English data can significantly reduce toxicity in multilingual open-ended generations.
- Score: 17.784213168942117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detoxifying multilingual Large Language Models (LLMs) has become crucial due to their increasing global use. In this work, we explore zero-shot cross-lingual generalization of preference tuning in detoxifying LLMs. Unlike previous studies that show limited cross-lingual generalization for other safety tasks, we demonstrate that Direct Preference Optimization (DPO) training with only English data can significantly reduce toxicity in multilingual open-ended generations. For example, the probability of mGPT-1.3B generating toxic continuations drops from 46.8% to 3.9% across 17 different languages after training. Our results also extend to other multilingual LLMs, such as BLOOM, Llama3, and Aya-23. Using mechanistic interpretability tools like causal intervention and activation analysis, we identified the dual multilinguality property of MLP layers in LLMs, which explains the cross-lingual generalization of DPO. Finally, we show that bilingual sentence retrieval can predict the cross-lingual transferability of DPO preference tuning.
Related papers
- RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs [13.563021984882704]
We introduce a novel, scalable method for generating high-quality multilingual feedback data.
Our preference-trained model achieves a 54.4% win-rate against Aya 23 8B.
As a result of our study, we expand the frontier of alignment techniques to 23 languages covering half of the world's population.
arXiv Detail & Related papers (2024-07-02T17:42:30Z) - The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models [0.4821250031784094]
This study compares the cross-lingual transfer capability of public Small Language Models (M) and Large Language Models (LLM)
Our findings reveal that among public models, SMLMs exhibit superior zero-shot cross-lingual performance.
In few-shot cross-lingual settings, public LLMs demonstrate an enhanced adaptive potential.
arXiv Detail & Related papers (2024-06-27T17:38:45Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - Unveiling Linguistic Regions in Large Language Models [49.298360366468934]
Large Language Models (LLMs) have demonstrated considerable cross-lingual alignment and generalization ability.
This paper conducts several investigations on the linguistic competence of LLMs.
arXiv Detail & Related papers (2024-02-22T16:56:13Z) - MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization [65.31411639849516]
We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language.
Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages.
Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
arXiv Detail & Related papers (2024-01-12T18:03:54Z) - Vicinal Risk Minimization for Few-Shot Cross-lingual Transfer in Abusive
Language Detection [19.399281609371258]
Cross-lingual transfer learning from high-resource to medium and low-resource languages has shown encouraging results.
We resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection.
arXiv Detail & Related papers (2023-11-03T16:51:07Z) - Tokenizer Choice For LLM Training: Negligible or Crucial? [30.33170936148845]
We study the influence of tokenizer choice on Large Language Models (LLMs) downstream performance by training 24 mono- and multilingual LLMs.
We find that the tokenizer choice can significantly impact the model's downstream performance and training costs.
We show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English.
arXiv Detail & Related papers (2023-10-12T22:44:19Z) - Exploring Human-Like Translation Strategy with Large Language Models [93.49333173279508]
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios.
This work proposes the MAPS framework, which stands for Multi-Aspect Prompting and Selection.
We employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge.
arXiv Detail & Related papers (2023-05-06T19:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.