Fugu-MT 論文翻訳(概要): Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning

論文の概要: Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning

arxiv url: http://arxiv.org/abs/2510.00881v1
Date: Wed, 01 Oct 2025 13:28:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.581029
Title: Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning
Title（参考訳）: SEにおける自動倫理プロファイルの強化:LDM推論のゼロショット評価
Authors: Patrizio Migliarini, Mashal Afzal Memon, Marco Autili, Paola Inverardi,
Abstract要約: 大規模言語モデル(LLM)は、コード合成を超えて拡張されたタスクのためのソフトウェア工学(SE)ツールにますます統合されています。ゼロショット設定で16LLMの倫理的推論能力を評価するための完全に自動化されたフレームワークを提案する。
参考スコア（独自算出の注目度）: 1.389448546196977
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) are increasingly integrated into software engineering (SE) tools for tasks that extend beyond code synthesis, including judgment under uncertainty and reasoning in ethically significant contexts. We present a fully automated framework for assessing ethical reasoning capabilities across 16 LLMs in a zero-shot setting, using 30 real-world ethically charged scenarios. Each model is prompted to identify the most applicable ethical theory to an action, assess its moral acceptability, and explain the reasoning behind their choice. Responses are compared against expert ethicists' choices using inter-model agreement metrics. Our results show that LLMs achieve an average Theory Consistency Rate (TCR) of 73.3% and Binary Agreement Rate (BAR) on moral acceptability of 86.7%, with interpretable divergences concentrated in ethically ambiguous cases. A qualitative analysis of free-text explanations reveals strong conceptual convergence across models despite surface-level lexical diversity. These findings support the potential viability of LLMs as ethical inference engines within SE pipelines, enabling scalable, auditable, and adaptive integration of user-aligned ethical reasoning. Our focus is the Ethical Interpreter component of a broader profiling pipeline: we evaluate whether current LLMs exhibit sufficient interpretive stability and theory-consistent reasoning to support automated profiling.
Abstract（参考訳）: 大規模言語モデル(LLM)は、倫理的に重要な文脈における不確実性に基づく判断や推論を含む、コード合成を超えたタスクのためのソフトウェア工学(SE)ツールにますます統合されている。ゼロショット設定で16LLMの倫理的推論能力を評価するための,30の現実的な倫理的推論シナリオを用いて,完全に自動化されたフレームワークを提案する。各モデルは、行動に最も適用可能な倫理理論を特定し、その道徳的受容性を評価し、その選択の背後にある理由を説明するよう促される。モデル間合意の指標を用いて、専門家の倫理主義者の選択に対して反応が比較される。以上の結果から,LLM は道徳的受容率 86.7% で平均理論整合率 73.3% と 2次合意率 (BAR) を達成し,倫理的曖昧なケースでは解釈可能な相違が集中していることがわかった。自由文の説明の質的な分析は、表面レベルの語彙の多様性に拘わらず、モデル間で強い概念収束を示す。これらの知見は、SEパイプライン内の倫理推論エンジンとしてのLLMの潜在可能性を支持し、スケーラブルで監査可能で適応的なユーザ整合性推論の統合を可能にする。我々の焦点は、より広範なプロファイリングパイプラインの倫理的解釈要素であり、我々は、現在のLLMが、自動プロファイリングをサポートするのに十分な解釈安定性と理論に一貫性のある推論を示すかどうかを評価する。

論文の概要: Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning

関連論文リスト