Fugu-MT 論文翻訳(概要): Emotional Cost Functions for AI Safety: Teaching Agents to Feel the Weight of Irreversible Consequences

論文の概要: Emotional Cost Functions for AI Safety: Teaching Agents to Feel the Weight of Irreversible Consequences

arxiv url: http://arxiv.org/abs/2603.14531v1
Date: Sun, 15 Mar 2026 18:16:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.870514
Title: Emotional Cost Functions for AI Safety: Teaching Agents to Feel the Weight of Irreversible Consequences
Title（参考訳）: AI安全性のための感情的コスト関数: エージェントが不可逆なコンセントの重み付けを教える
Authors: Pandurang Mopgar,
Abstract要約: 人間は、数値的な罰ではなく、自分が誰であるかを想起する質的な苦しみを通して破滅的な間違いから学ぶ。現在のAIの安全性アプローチは、これらを複製しない。金融取引、危機支援、コンテンツに関する10の実験は、質的な苦しみが麻痺を一般化するよりも特定の知恵を生み出すことを示している。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Humans learn from catastrophic mistakes not through numerical penalties, but through qualitative suffering that reshapes who they are. Current AI safety approaches replicate none of this. Reward shaping captures magnitude, not meaning. Rule-based alignment constrains behaviour, but does not change it. We propose Emotional Cost Functions, a framework in which agents develop Qualitative Suffering States, rich narrative representations of irreversible consequences that persist forward and actively reshape character. Unlike numerical penalties, qualitative suffering states capture the meaning of what was lost, the specific void it creates, and how it changes the agent's relationship to similar future situations. Our four-component architecture - Consequence Processor, Character State, Anticipatory Scan, and Story Update is grounded in one principle. Actions cannot be undone and agents must live with what they have caused. Anticipatory dread operates through two pathways. Experiential dread arises from the agent's own lived consequences. Pre-experiential dread is acquired without direct experience, through training or inter-agent transmission. Together they mirror how human wisdom accumulates across experience and culture. Ten experiments across financial trading, crisis support, and content moderation show that qualitative suffering produces specific wisdom rather than generalised paralysis. Agents correctly engage with moderate opportunities at 90-100% while numerical baselines over-refuse at 90%. Architecture ablation confirms the mechanism is necessary. The full system generates ten personal grounding phrases per probe vs. zero for a vanilla LLM. Statistical validation (N=10) confirms reproducibility at 80-100% consistency.
Abstract（参考訳）: 人間は、数値的な罰ではなく、自分が誰であるかを想起する質的な苦しみを通して破滅的な間違いから学ぶ。現在のAIの安全性アプローチは、これらを複製しない。逆転形は大きさをとらえるが、意味はない。ルールベースのアライメントは振る舞いを制約しますが、それを変更しません。エージェントが質的獲得状態を開発するためのフレームワークである情緒的コスト関数を提案する。数値的な刑罰とは異なり、質的な苦悩状態は、失われたものの意味、それが生み出す特定の空白、そしてそれがエージェントと同様の将来の状況との関係をどのように変化させるかを捉えている。コンシークエンスプロセッサ、キャラクタステート、予測スキャン、ストーリーアップデートという4つのコンポーネントアーキテクチャは、1つの原則に基づいています。行動は取り除かれず、エージェントはそれらが引き起こしたものと共に生きなければならない。予知恐怖は2つの経路を通り抜ける。経験的な恐怖は、エージェント自身の生きた結果から生じる。実験前の恐怖は、訓練やエージェント間伝達を通じて直接経験なく取得される。人間の知恵は経験と文化にまたがって蓄積される。金融取引、危機支援、コンテンツモデレーションに関する10の実験では、質的な苦しみは一般的な麻痺よりも特定の知恵を生み出すことが示されている。エージェントは90-100%で適度な機会に正しく対応し、数値ベースラインは90%でオーバーヒューズする。アーキテクチャのアブレーションは、そのメカニズムが必要であることを確認します。フルシステムは、バニラLLMに対して、プローブ毎に10の個人接地句を生成する。統計的検証 (N=10) は80-100%の一致で再現性を確認する。

論文の概要: Emotional Cost Functions for AI Safety: Teaching Agents to Feel the Weight of Irreversible Consequences

関連論文リスト