Fugu-MT 論文翻訳(概要): EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

論文の概要: EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

arxiv url: http://arxiv.org/abs/2510.05942v2
Date: Wed, 08 Oct 2025 08:03:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 14:21:18.205031
Title: EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models
Title（参考訳）: EvalMORAAL:大規模言語モデルにおけるモーラルアライメントの解釈的連鎖とLCM-as-Judge評価
Authors: Hadi Mohammadi, Anastasia Giachanou, Ayoub Bagheri,
Abstract要約: EvalMORAALは20の大規模言語モデルにおいてモラルアライメントを評価する透過的なチェーン・オブ・シントフレームワークである。世界価値調査(55か国、19か国)とPEWグローバル姿勢調査(39か国、8か国)のモデルを評価する。
参考スコア（独自算出の注目度）: 1.141545154221656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present EvalMORAAL, a transparent chain-of-thought (CoT) framework that uses two scoring methods (log-probabilities and direct ratings) plus a model-as-judge peer review to evaluate moral alignment in 20 large language models. We assess models on the World Values Survey (55 countries, 19 topics) and the PEW Global Attitudes Survey (39 countries, 8 topics). With EvalMORAAL, top models align closely with survey responses (Pearson's r approximately 0.90 on WVS). Yet we find a clear regional difference: Western regions average r=0.82 while non-Western regions average r=0.61 (a 0.21 absolute gap), indicating consistent regional bias. Our framework adds three parts: (1) two scoring methods for all models to enable fair comparison, (2) a structured chain-of-thought protocol with self-consistency checks, and (3) a model-as-judge peer review that flags 348 conflicts using a data-driven threshold. Peer agreement relates to survey alignment (WVS r=0.74, PEW r=0.39, both p<.001), supporting automated quality checks. These results show real progress toward culture-aware AI while highlighting open challenges for use across regions.
Abstract（参考訳）: EvalMORAALは2つのスコアリング手法(ログ確率と直接評価)とモデル・アズ・ジャッジ・ピア・レビューを用いて20大言語モデルにおけるモラルアライメントを評価する透過的チェーン・オブ・シント(CoT)フレームワークである。我々は,世界価値調査(55か国,19か国)およびPEWグローバル姿勢調査(39か国,8か国)のモデルを評価する。 EvalMORAALでは、トップモデルはサーベイレスポンス(WVSのピアソン r は約0.90)と密接に一致している。西部地域の平均はr=0.82であり、非西部地域の平均はr=0.61(0.21絶対差)であり、一貫した地域バイアスを示している。本フレームワークには, 公正比較を可能にする2つのモデル評価手法, 2) 自己整合性チェックを備えた構造化連鎖プロトコル, (3) データ駆動しきい値を用いて348の競合をフラグするモデル・アズ・ジャッジ・ピア・レビュー, の3つの部分が追加されている。ピア合意は、自動品質チェックをサポートするサーベイアライメント(WVS r=0.74, PEW r=0.39, both p<.001)に関連している。これらの結果は、文化を意識したAIへの真の進歩を示しながら、地域横断で使用するためのオープンな課題を強調している。

論文の概要: EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

関連論文リスト