Fugu-MT 論文翻訳(概要): MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

論文の概要: MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

arxiv url: http://arxiv.org/abs/2510.16380v1
Date: Sat, 18 Oct 2025 07:34:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:38.97462
Title: MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Title（参考訳）: MoReBench: 言語モデルにおける手続き的および多元的モラル推論の評価
Authors: Yu Ying Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, Yash Maurya, Christina Q Knight, Harry R. Lloyd, Florence Bacus, Mantas Mazeika, Bing Liu, Yejin Choi, Mitchell L Gordon, Sydney Levine,
Abstract要約: MoReBench: 1,000のモラルシナリオを紹介します。それぞれが、シナリオを推論するとき、専門家が必須とみなす基準のセットと組み合わせています。 MoReBenchには、道徳的考慮事項の特定、トレードオフの重み付け、行動可能なレコメンデーションなど、23万以上の基準が含まれている。第二に、MoReBench-Theory: 150の例を使って、AIが規範的倫理の5つの主要なフレームワークで推論できるかどうかを検証します。
参考スコア（独自算出の注目度）: 31.1183238867944
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.
Abstract（参考訳）: AIシステムが進むにつれて、私たちと私たちとの意思決定にもっと頼りにしています。このような決定が人間の価値観と一致していることを保証するためには、どのような決定を下すかだけでなく、その決定にどのような影響を与えるのかを理解することが不可欠です。最終応答と(部分的に透明な)中間的思考トレースの両方を提供する言語モデルの推論は、AIの手続き的推論を研究するためのタイムリーな機会を提供する。客観的に正しい答えを持つ数学やコード問題とは異なり、モラルジレンマは複数の証明可能な結論を許容するため、プロセス中心の評価に優れたテストベッドである。そのために、MoReBench: 1,000のモラルシナリオを紹介します。 MoReBenchには、道徳的考慮事項の特定、トレードオフの重み付け、人間の道徳的決定を助言するAIのケースをカバーするための行動可能な勧告、そして道徳的決定を自律的に行うことを含む、23万以上の基準が含まれている。第二に、MoReBench-Theory: 150の例を使って、AIが規範的倫理の5つの主要なフレームワークで推論できるかどうかを検証します。この結果から,数学,コード,科学的推論タスクのスケーリング法則と既存のベンチマークは,道徳的推論を行うためのモデル能力の予測に失敗することが示された。モデルは特定の道徳的枠組み(例えば、ベンタマイト法(英語版)やカンティアン・デオントロジー)に対する部分性も示しており、これは一般的な訓練パラダイムの副作用である可能性がある。これらのベンチマークは、プロセス中心の推論評価を、より安全で透明性の高いAIに進める。

論文の概要: MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

関連論文リスト