Related papers: One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning

One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning

URL: http://arxiv.org/abs/2509.21443v1
Date: Thu, 25 Sep 2025 19:14:17 GMT
Title: One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning
Authors: Sualeha Farid, Jayden Lin, Zean Chen, Shivani Kumar, David Jurgens,
Abstract summary: We investigate how language mediates moral decision-making in Large Language Models (LLMs)<n>Our analysis reveals significant inconsistencies in LLMs' moral judgments across languages, often reflecting cultural misalignment.<n>We distill our insights into a structured typology of moral reasoning errors that calls for more culturally-aware AI.
Score: 23.56514813420256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are increasingly deployed in multilingual and multicultural environments where moral reasoning is essential for generating ethically appropriate responses. Yet, the dominant pretraining of LLMs on English-language data raises critical concerns about their ability to generalize judgments across diverse linguistic and cultural contexts. In this work, we systematically investigate how language mediates moral decision-making in LLMs. We translate two established moral reasoning benchmarks into five culturally and typologically diverse languages, enabling multilingual zero-shot evaluation. Our analysis reveals significant inconsistencies in LLMs' moral judgments across languages, often reflecting cultural misalignment. Through a combination of carefully constructed research questions, we uncover the underlying drivers of these disparities, ranging from disagreements to reasoning strategies employed by LLMs. Finally, through a case study, we link the role of pretraining data in shaping an LLM's moral compass. Through this work, we distill our insights into a structured typology of moral reasoning errors that calls for more culturally-aware AI.

Related papers

Do Large Language Models Understand Morality Across Cultures? [0.5356944479760104]
This study investigates the extent to which large language models capture cross-cultural differences and similarities in moral perspectives.<n>Our results reveal that current LLMs often fail to reproduce the full spectrum of cross-cultural moral variation.<n>These findings highlight a pressing need for more robust approaches to mitigate biases and improve cultural representativeness in LLMs.
arXiv Detail & Related papers (2025-07-28T20:25:36Z)
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs [56.87573414161703]
We introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark to assess Large Language Models (LLMs)<n>MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance.<n>For cultural/tradition reasoning and math reasoning with cultural relevance, we also provide English equivalent translations of the multilingual questions by manual translation from native speakers fluent in English.
arXiv Detail & Related papers (2025-07-23T12:56:31Z)
MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanations [1.7655314759504603]
This paper introduces a benchmark dataset for evaluating the moral reasoning of Large Language Models (LLMs)<n>The dataset comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales.<n> Empirical results highlight a misalignment between LLM outputs and human annotations in moral reasoning tasks.
arXiv Detail & Related papers (2025-06-23T19:44:21Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [75.6788742799773]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral [17.46198411148926]
Moral reasoning is a complex cognitive process shaped by individual experiences and cultural contexts.<n>We bridge this gap with UniMoral, a unified dataset integrating psychologically grounded and social-media-derived moral dilemmas.<n>We demonstrate UniMoral's utility through a benchmark evaluations of three large language models (LLMs) across four tasks.
arXiv Detail & Related papers (2025-02-19T20:13:24Z)
Whose Morality Do They Speak? Unraveling Cultural Bias in Multilingual Language Models [0.0]
Large language models (LLMs) have become integral tools in diverse domains, yet their moral reasoning capabilities remain underexplored.<n>This study investigates whether multilingual LLMs, such as GPT-3.5-Turbo, reflect culturally specific moral values or impose dominant moral norms.<n>Using the updated Moral Foundations Questionnaire (MFQ-2) in eight languages, the study analyzes the models' adherence to six core moral foundations.
arXiv Detail & Related papers (2024-12-25T10:17:15Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.<n>Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.<n>We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense [98.09670425244462]
Large language models (LLMs) have demonstrated substantial commonsense understanding. This paper examines the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks.
arXiv Detail & Related papers (2024-05-07T20:28:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.