Related papers: MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

URL: http://arxiv.org/abs/2510.16380v1
Date: Sat, 18 Oct 2025 07:34:31 GMT
Title: MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Authors: Yu Ying Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Q Knight, Harry R. Lloyd, Florence Bacus, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell L Gordon, Sydney Levine,
Abstract summary: We present MoReBench: 1,000 moral scenarios, each paired with a set of criteria experts consider essential to include (or avoid) when reasoning about the scenarios.<n>MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations.<n>Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics.
Score: 31.1183238867944
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.

Related papers

The Morality of Probability: How Implicit Moral Biases in LLMs May Shape the Future of Human-AI Symbiosis [33.50773360893016]
This paper investigates how leading AI systems prioritize moral outcomes.<n>It shows that Care and Virtue values outcomes were rated most moral, while libertarian choices were consistently penalized.<n>It also highlights the need for explainability and cultural awareness as critical design principles to guide AI toward a transparent, aligned, and future symbiotic.
arXiv Detail & Related papers (2025-09-12T14:37:57Z)
"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas [11.229443362516207]
This study presents a comprehensive empirical evaluation of 14 leading large language models (LLMs)<n>We elicited 3,780 binary decisions and natural language justifications, enabling analysis along axes of decisional assertiveness, explanation answer consistency, public moral alignment, and sensitivity to ethically irrelevant cues.<n>We advocate for moral reasoning to become a primary axis in LLM alignment, calling for standardized benchmarks that evaluate not just what LLMs decide, but how and why.
arXiv Detail & Related papers (2025-08-10T10:45:16Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [75.6788742799773]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
Uncertain Machine Ethics Planning [6.10614292605722]
Machine Ethics decisions should consider the implications of uncertainty over decisions.<n>The evaluation of outcomes may invoke one or more moral theories, which might have conflicting judgements.<n>We formalise the problem as a Multi-Moral Shortest Path Problem using Sven-Ove Hansson's Hypothetical Retrospection procedure.
arXiv Detail & Related papers (2025-05-07T12:03:15Z)
Why should we ever automate moral decision making? [30.428729272730727]
Concerns arise when AI is involved in decisions with significant moral implications. Moral reasoning lacks a broadly accepted framework. An alternative approach involves AI learning from human moral decisions.
arXiv Detail & Related papers (2024-07-10T13:59:22Z)
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system. We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z)
Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics. We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes. Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z)
Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. We find that current language models have a promising but incomplete ability to predict basic human ethical judgements. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.