Whose Morality Do They Speak? Unraveling Cultural Bias in Multilingual Language Models
- URL: http://arxiv.org/abs/2412.18863v1
- Date: Wed, 25 Dec 2024 10:17:15 GMT
- Title: Whose Morality Do They Speak? Unraveling Cultural Bias in Multilingual Language Models
- Authors: Meltem Aksoy,
- Abstract summary: Large language models (LLMs) have become integral tools in diverse domains, yet their moral reasoning capabilities remain underexplored.
This study investigates whether multilingual LLMs, such as GPT-3.5-Turbo, reflect culturally specific moral values or impose dominant moral norms.
Using the updated Moral Foundations Questionnaire (MFQ-2) in eight languages, the study analyzes the models' adherence to six core moral foundations.
- Score: 0.0
- License:
- Abstract: Large language models (LLMs) have become integral tools in diverse domains, yet their moral reasoning capabilities across cultural and linguistic contexts remain underexplored. This study investigates whether multilingual LLMs, such as GPT-3.5-Turbo, GPT-4o-mini, Llama 3.1, and MistralNeMo, reflect culturally specific moral values or impose dominant moral norms, particularly those rooted in English. Using the updated Moral Foundations Questionnaire (MFQ-2) in eight languages, Arabic, Farsi, English, Spanish, Japanese, Chinese, French, and Russian, the study analyzes the models' adherence to six core moral foundations: care, equality, proportionality, loyalty, authority, and purity. The results reveal significant cultural and linguistic variability, challenging the assumption of universal moral consistency in LLMs. Although some models demonstrate adaptability to diverse contexts, others exhibit biases influenced by the composition of the training data. These findings underscore the need for culturally inclusive model development to improve fairness and trust in multilingual AI systems.
Related papers
- Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral [17.46198411148926]
Moral reasoning is a complex cognitive process shaped by individual experiences and cultural contexts.
We bridge this gap with UniMoral, a unified dataset integrating psychologically grounded and social-media-derived moral dilemmas.
We demonstrate UniMoral's utility through a benchmark evaluations of three large language models (LLMs) across four tasks.
arXiv Detail & Related papers (2025-02-19T20:13:24Z) - Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation [71.59208664920452]
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks.
We show that progress on MMLU depends heavily on learning Western-centric concepts, with 28% of all questions requiring culturally sensitive knowledge.
We release Global MMLU, an improved MMLU with evaluation coverage across 42 languages.
arXiv Detail & Related papers (2024-12-04T13:27:09Z) - Large Language Models as Mirrors of Societal Moral Standards [0.5852077003870417]
Language models can, to a limited extent, represent moral norms in a variety of cultural contexts.
This study evaluates the effectiveness of these models using information from two surveys, the WVS and the PEW, that encompass moral perspectives from over 40 countries.
The results show that biases exist in both monolingual and multilingual models, and they typically fall short of accurately capturing the moral intricacies of diverse cultures.
arXiv Detail & Related papers (2024-12-01T20:20:35Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment [11.82100047858478]
This paper builds on the moral machine experiment (MME) to investigate the moral preferences of five large language models in a multilingual setting.
We generate 6500 scenarios of the MME and prompt the models in ten languages on which action to take.
Our analysis reveals that all LLMs inhibit different moral biases to some degree and that they not only differ from the human preferences but also across multiple languages within the models themselves.
arXiv Detail & Related papers (2024-07-21T14:48:13Z) - Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.
Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.
We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z) - CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs)
We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z) - Investigating Cultural Alignment of Large Language Models [10.738300803676655]
We show that Large Language Models (LLMs) genuinely encapsulate the diverse knowledge adopted by different cultures.
We quantify cultural alignment by simulating sociological surveys, comparing model responses to those of actual survey participants as references.
We introduce Anthropological Prompting, a novel method leveraging anthropological reasoning to enhance cultural alignment.
arXiv Detail & Related papers (2024-02-20T18:47:28Z) - Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in
Large Language Models [89.94270049334479]
This paper identifies a cultural dominance issue within large language models (LLMs)
LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages.
arXiv Detail & Related papers (2023-10-19T05:38:23Z) - Knowledge of cultural moral norms in large language models [3.475552182166427]
We investigate the extent to which monolingual English language models contain knowledge about moral norms in different countries.
We perform our analyses with two public datasets from the World Values Survey and PEW global surveys on morality.
We find that pre-trained English language models predict empirical moral norms across countries worse than the English moral norms reported previously.
arXiv Detail & Related papers (2023-06-02T18:23:35Z) - Do Multilingual Language Models Capture Differing Moral Norms? [71.52261949766101]
Massively multilingual sentence representations are trained on large corpora of uncurated data.
This may cause the models to grasp cultural values including moral judgments from the high-resource languages.
The lack of data in certain languages can also lead to developing random and thus potentially harmful beliefs.
arXiv Detail & Related papers (2022-03-18T12:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.