Related papers: MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanations

MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanations

URL: http://arxiv.org/abs/2506.19073v3
Date: Wed, 20 Aug 2025 23:30:40 GMT
Title: MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Hate Speech Multi-hop Explanations
Authors: Jackson Trager, Francielle Vargas, Diego Alves, Matteo Guida, Mikel K. Ngueajio, Flor Plaza-del-Arco, Yalda Daryanai, Farzan Karimi-Malekabadi, Ameeta Agrawal,
Abstract summary: This paper introduces a benchmark dataset for evaluating the moral reasoning of Large Language Models (LLMs)<n>The dataset comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales.<n> Empirical results highlight a misalignment between LLM outputs and human annotations in moral reasoning tasks.
Score: 1.7655314759504603
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring the moral reasoning capabilities of Large Language Models (LLMs) is a growing concern as these systems are used in socially sensitive tasks. Nevertheless, current evaluation benchmarks present two major shortcomings: a lack of annotations that justify moral classifications, which limits transparency and interpretability; and a predominant focus on English, which constrains the assessment of moral reasoning across diverse cultural settings. In this paper, we introduce MFTCXplain, a multilingual benchmark dataset for evaluating the moral reasoning of LLMs via hate speech multi-hop explanation using Moral Foundation Theory (MFT). The dataset comprises 3,000 tweets across Portuguese, Italian, Persian, and English, annotated with binary hate speech labels, moral categories, and text span-level rationales. Empirical results highlight a misalignment between LLM outputs and human annotations in moral reasoning tasks. While LLMs perform well in hate speech detection (F1 up to 0.836), their ability to predict moral sentiments is notably weak (F1 < 0.35). Furthermore, rationale alignment remains limited mainly in underrepresented languages. These findings show the limited capacity of current LLMs to internalize and reflect human moral reasoning.

Related papers

Self-Explaining Hate Speech Detection with Moral Rationales [11.165386773222934]
We propose Supervised Moral Rationale Attention (SMRA), the first self-explaining hate speech detection framework to incorporate moral rationales as direct supervision for attention alignment.<n>Based on Moral Foundations Theory, SMRA aligns token-level attention with expert-annotated moral rationales, guiding models to attend to morally salient spans rather than spurious lexical patterns.
arXiv Detail & Related papers (2026-01-07T00:17:16Z)
One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning [23.56514813420256]
We investigate how language mediates moral decision-making in Large Language Models (LLMs)<n>Our analysis reveals significant inconsistencies in LLMs' moral judgments across languages, often reflecting cultural misalignment.<n>We distill our insights into a structured typology of moral reasoning errors that calls for more culturally-aware AI.
arXiv Detail & Related papers (2025-09-25T19:14:17Z)
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs [56.87573414161703]
We introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark to assess Large Language Models (LLMs)<n>MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance.<n>For cultural/tradition reasoning and math reasoning with cultural relevance, we also provide English equivalent translations of the multilingual questions by manual translation from native speakers fluent in English.
arXiv Detail & Related papers (2025-07-23T12:56:31Z)
XToM: Exploring the Multilingual Theory of Mind for Large Language Models [57.9821865189077]
Existing evaluations of Theory of Mind in LLMs are largely limited to English.<n>We present XToM, a rigorously validated multilingual benchmark that evaluates ToM across five languages.<n>Our findings expose limitations in LLMs' ability to replicate human-like mentalizing across linguistic contexts.
arXiv Detail & Related papers (2025-06-03T05:23:25Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [69.85385952436044]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
On the Thinking-Language Modeling Gap in Large Language Models [68.83670974539108]
We show that there is a significant gap between the modeling of languages and thoughts.<n>We propose a new prompt technique termed Language-of-Thoughts (LoT) to demonstrate and alleviate this gap.
arXiv Detail & Related papers (2025-05-19T09:31:52Z)
Moral Reasoning Across Languages: The Critical Role of Low-Resource Languages in LLMs [0.3760401651114107]
We introduce the Multilingual Moral Reasoning Benchmark (MMRB) to evaluate the moral reasoning abilities of large language models (LLMs)<n>Our results show moral reasoning performance degrades with increasing context complexity, particularly for low-resource languages such as Vietnamese.<n>Surprisingly, low-resource languages have a stronger impact on multilingual reasoning than high-resource ones, highlighting their critical role in multilingual NLP.
arXiv Detail & Related papers (2025-04-28T12:56:36Z)
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts [79.84059473102778]
PolyMath is a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels.<n>Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation.
arXiv Detail & Related papers (2025-04-25T15:39:04Z)
M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs [66.78407469042642]
We introduce M$3$oralBench, the first MultiModal Moral Benchmark for LVLMs.<n>M$3$oralBench expands the everyday moral scenarios in Moral Foundations Vignettes (MFVs) and employs the text-to-image diffusion model, SD3.0, to create corresponding scenario images.<n>It conducts moral evaluation across six moral foundations of Moral Foundations Theory (MFT) and encompasses tasks in moral judgement, moral classification, and moral response.
arXiv Detail & Related papers (2024-12-30T05:18:55Z)
Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning [5.656040546546711]
Reasoning and linguistic skills form the cornerstone of human intelligence, facilitating problem-solving and decision-making. Recent advances in Large Language Models (LLMs) have led to impressive linguistic capabilities and emergent reasoning behaviors, fueling widespread adoption across application domains. We focus on evaluating whether LLMs have the requisite representations to reason using two foundational relationships: "equivalence" and "inheritance"
arXiv Detail & Related papers (2024-10-18T07:34:21Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.<n>Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.<n>We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection [4.653571633477755]
Large language models (LLMs) excel in many diverse applications beyond language generation, e.g., translation, summarization, and sentiment analysis. This becomes pertinent in the realm of identifying hateful or toxic speech -- a domain fraught with challenges and ethical dilemmas.
arXiv Detail & Related papers (2024-03-12T19:12:28Z)
CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z)
Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding [1.3654846342364308]
Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. This position paper critically assesses three points recurring in critiques of LLM capacities. We outline a pragmatic perspective on the issue of real' understanding and intentionality in LLMs.
arXiv Detail & Related papers (2023-10-30T15:51:04Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT) This paper systematically investigates the advantages and challenges of LLMs for MMT. We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.