Fugu-MT 論文翻訳(概要): Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms

論文の概要: Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms

arxiv url: http://arxiv.org/abs/2512.15793v1
Date: Tue, 16 Dec 2025 09:04:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-19 18:10:31.722313
Title: Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms
Title（参考訳）: 対立する社会的ノルムの生成による人間の行動に関する説明可能な倫理的評価
Authors: Yuxi Sun, Wei Gao, Hongzhan Lin, Jing Ma, Wenxuan Zhang,
Abstract要約: 我々は,有病率予測と説明力を高めるための新しい倫理的評価手法であるtextitEthicを紹介する。提案手法は,強いベースラインアプローチよりも優れており,人的評価により,生成した社会的規範が妥当な説明を提供することを確認した。
参考スコア（独自算出の注目度）: 25.931377041506455
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Human behaviors are often guided or constrained by social norms, which are defined as shared, commonsense rules. For example, underlying an action ``\textit{report a witnessed crime}" are social norms that inform our conduct, such as ``\textit{It is expected to be brave to report crimes}''. Current AI systems that assess valence (i.e., support or oppose) of human actions by leveraging large-scale data training not grounded on explicit norms may be difficult to explain, and thus untrustworthy. Emulating human assessors by considering social norms can help AI models better understand and predict valence. While multiple norms come into play, conflicting norms can create tension and directly influence human behavior. For example, when deciding whether to ``\textit{report a witnessed crime}'', one may balance \textit{bravery} against \textit{self-protection}. In this paper, we introduce \textit{ClarityEthic}, a novel ethical assessment approach, to enhance valence prediction and explanation by generating conflicting social norms behind human actions, which strengthens the moral reasoning capabilities of language models by using a contrastive learning strategy. Extensive experiments demonstrate that our method outperforms strong baseline approaches, and human evaluations confirm that the generated social norms provide plausible explanations for the assessment of human behaviors.
Abstract（参考訳）: 人間の行動は、しばしば共有常識ルールとして定義される社会的規範によって導かれるか制約される。例えば、「`\textit{report a witnessed crime}」は「``\textit{It is be brave to report crimes}'」のような行動を伝える社会的規範である。現在のAIシステムは、明示的な規範に基づいていない大規模なデータトレーニングを活用することで、人間の行動の価値(すなわち、サポートまたは反対)を評価することは、説明が難しいため、信頼できない。社会的規範を考慮することで人間の評価をエミュレートすることは、AIモデルをよりよく理解し、価値を予測するのに役立つ。複数の規範が成立する一方で、矛盾する規範は緊張を生じさせ、人間の行動に直接影響を及ぼす。例えば、 ``\textit{report a witnessed crime}'' を決めた場合、 \textit{bravery} と \textit{self- protection} のバランスをとることができる。本稿では,新たな倫理的評価手法である「textit{ClarityEthic}」を導入し,人間の行動の背後にある矛盾する社会的規範を生み出すことにより,価値の予測と説明の促進を図り,対照的な学習戦略を用いて言語モデルの道徳的推論能力を強化する。大規模な実験により,本手法は強いベースラインアプローチよりも優れており,人的評価は,生成した社会的規範が人間の行動評価に有効な説明を提供することを示す。

論文の概要: Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms

関連論文リスト