Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?
- URL: http://arxiv.org/abs/2406.16316v1
- Date: Mon, 24 Jun 2024 04:50:12 GMT
- Title: Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?
- Authors: Yuu Jinnai,
- Abstract summary: alignment of language models with human preferences is a common approach to making a language model useful to end users.
Most alignment work is done in English, and human preference datasets are dominated by English.
We investigate whether aligning Japanese language models with English resources marginalizes the preference of non-English speaking users.
- Score: 3.48097307252416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Alignment of the language model with human preferences is a common approach to making a language model useful to end users. However, most alignment work is done in English, and human preference datasets are dominated by English, reflecting only the preferences of English-speaking annotators. Nevertheless, it is common practice to use the English preference data, either directly or by translating it into the target language, when aligning a multilingual language model. The question is whether such an alignment strategy marginalizes the preference of non-English speaking users. To this end, we investigate the effect of aligning Japanese language models with (mostly) English resources. In particular, we focus on evaluating whether the commonsense morality of the resulting fine-tuned models is aligned with Japanese culture using the JCommonsenseMorality (JCM) and ETHICS datasets. The experimental results show that the fine-tuned model outperforms the SFT model. However, it does not demonstrate the same level of improvement as a model fine-tuned using the JCM, suggesting that while some aspects of commonsense morality are transferable, others may not be.
Related papers
- ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment [39.94156255629528]
We evaluate a simple approach for zero-shot cross-lingual alignment.
Cross-lingually aligned models are preferred by humans over unaligned models.
A different-language reward model sometimes yields better aligned models than a same-language reward model.
arXiv Detail & Related papers (2024-04-18T16:52:36Z) - Unintended Impacts of LLM Alignment on Global Representation [62.6579934112071]
We show that developers align Large Language Models (LLMs) to user preferences through a variety of procedures, such as Reinforcement Learning From Human Feedback (RLHF) and Direct Preference Optimization (DPO)
We explore how alignment impacts performance along three axes of global representation: English dialects, multilingualism, and opinions from and about countries worldwide.
We conclude by discussing design decisions that led to these unintended impacts and recommendations for more equitable preference tuning.
arXiv Detail & Related papers (2024-02-22T23:31:22Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - FineDeb: A Debiasing Framework for Language Models [3.7698299781999376]
We propose FineDeb, a two-phase debiasing framework for language models.
Our results show that FineDeb offers stronger debiasing in comparison to other methods.
Our framework is generalizable for demographics with multiple classes.
arXiv Detail & Related papers (2023-02-05T18:35:21Z) - Speaking Multiple Languages Affects the Moral Bias of Language Models [70.94372902010232]
Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer.
Do the models capture moral norms from English and impose them on other languages?
Our experiments demonstrate that, indeed, PMLMs encode differing moral biases, but these do not necessarily correspond to cultural differences or commonalities in human opinions.
arXiv Detail & Related papers (2022-11-14T20:08:54Z) - Compositional Evaluation on Japanese Textual Entailment and Similarity [20.864082353441685]
Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models.
Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English.
There are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English.
arXiv Detail & Related papers (2022-08-09T15:10:56Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.