One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning
- URL: http://arxiv.org/abs/2509.21443v1
- Date: Thu, 25 Sep 2025 19:14:17 GMT
- Title: One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning
- Authors: Sualeha Farid, Jayden Lin, Zean Chen, Shivani Kumar, David Jurgens,
- Abstract summary: We investigate how language mediates moral decision-making in Large Language Models (LLMs)<n>Our analysis reveals significant inconsistencies in LLMs' moral judgments across languages, often reflecting cultural misalignment.<n>We distill our insights into a structured typology of moral reasoning errors that calls for more culturally-aware AI.
- Score: 23.56514813420256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly deployed in multilingual and multicultural environments where moral reasoning is essential for generating ethically appropriate responses. Yet, the dominant pretraining of LLMs on English-language data raises critical concerns about their ability to generalize judgments across diverse linguistic and cultural contexts. In this work, we systematically investigate how language mediates moral decision-making in LLMs. We translate two established moral reasoning benchmarks into five culturally and typologically diverse languages, enabling multilingual zero-shot evaluation. Our analysis reveals significant inconsistencies in LLMs' moral judgments across languages, often reflecting cultural misalignment. Through a combination of carefully constructed research questions, we uncover the underlying drivers of these disparities, ranging from disagreements to reasoning strategies employed by LLMs. Finally, through a case study, we link the role of pretraining data in shaping an LLM's moral compass. Through this work, we distill our insights into a structured typology of moral reasoning errors that calls for more culturally-aware AI.
Related papers
- Do Large Language Models Understand Morality Across Cultures? [0.5356944479760104]
This study investigates the extent to which large language models capture cross-cultural differences and similarities in moral perspectives.<n>Our results reveal that current LLMs often fail to reproduce the full spectrum of cross-cultural moral variation.<n>These findings highlight a pressing need for more robust approaches to mitigate biases and improve cultural representativeness in LLMs.
arXiv Detail & Related papers (2025-07-28T20:25:36Z) - MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs [56.87573414161703]
We introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark to assess Large Language Models (LLMs)<n>MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance.<n>For cultural/tradition reasoning and math reasoning with cultural relevance, we also provide English equivalent translations of the multilingual questions by manual translation from native speakers fluent in English.
arXiv Detail & Related papers (2025-07-23T12:56:31Z) - MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanations [1.7655314759504603]
This paper introduces a benchmark dataset for evaluating the moral reasoning of Large Language Models (LLMs)<n>The dataset comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales.<n> Empirical results highlight a misalignment between LLM outputs and human annotations in moral reasoning tasks.
arXiv Detail & Related papers (2025-06-23T19:44:21Z) - Are Language Models Consequentialist or Deontological Moral Reasoners? [75.6788742799773]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z) - Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z) - Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral [17.46198411148926]
Moral reasoning is a complex cognitive process shaped by individual experiences and cultural contexts.<n>We bridge this gap with UniMoral, a unified dataset integrating psychologically grounded and social-media-derived moral dilemmas.<n>We demonstrate UniMoral's utility through a benchmark evaluations of three large language models (LLMs) across four tasks.
arXiv Detail & Related papers (2025-02-19T20:13:24Z) - Whose Morality Do They Speak? Unraveling Cultural Bias in Multilingual Language Models [0.0]
Large language models (LLMs) have become integral tools in diverse domains, yet their moral reasoning capabilities remain underexplored.<n>This study investigates whether multilingual LLMs, such as GPT-3.5-Turbo, reflect culturally specific moral values or impose dominant moral norms.<n>Using the updated Moral Foundations Questionnaire (MFQ-2) in eight languages, the study analyzes the models' adherence to six core moral foundations.
arXiv Detail & Related papers (2024-12-25T10:17:15Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.<n>Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.<n>We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z) - Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding.
This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.