Fugu-MT 論文翻訳(概要): Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

論文の概要: Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

arxiv url: http://arxiv.org/abs/2604.19457v1
Date: Tue, 21 Apr 2026 13:37:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.787998
Title: Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents
Title（参考訳）: 長期型エンタープライズAIエージェントのための4軸決定アライメント
Authors: Vasundra Srininvasan,
Abstract要約: 長期のエンタープライズエージェントは、失われた記憶、多段階の推論、および規制の制約の下で高い評価を下す。長距離決定行動は、4つの軸に分解され、それぞれ独立に測定可能で、フェール可能となる。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization) under lossy memory, multi-step reasoning, and binding regulatory constraints. Current evaluation reports a single task-success scalar that conflates distinct failure modes and hides whether an agent is aligned with the standards its deployment environment requires. We propose that long-horizon decision behavior decomposes into four orthogonal alignment axes, each independently measurable and failable: factual precision (FRP), reasoning coherence (RCS), compliance reconstruction (CRR), and calibrated abstention (CAR). CRR is a novel regulatory-grounded axis; CAR is a measurement axis separating coverage from accuracy. We exercise the decomposition on a controlled benchmark (LongHorizon-Bench) covering loan qualification and insurance claims adjudication with deterministic ground-truth construction. Running six memory architectures, we find structure aggregate accuracy cannot see: retrieval collapses on factual precision; schema-anchored architectures pay a scaffolding tax; plain summarization under a fact-preservation prompt is a strong baseline on FRP, RCS, EDA, and CRR; and all six architectures commit on every case, exposing a decisional-alignment axis the field has not targeted. The decomposition also surfaced a pre-registered prediction of our own, that summarization would fail factual recall, which the data reversed at large magnitude, an axis-level reversal aggregate accuracy would have hidden. Institutional alignment (regulatory reconstruction) and decisional alignment (calibrated abstention) are under-represented in the alignment literature and become load-bearing once decisions leave the laboratory. The framework transfers to any regulated decisioning domain via two steps: build a fact schema, and calibrate the CRR auditor prompt.
Abstract（参考訳）: ロングホライゾンのエンタープライズエージェントは、失われた記憶、多段階の推論、および規制の制約の下で、高い評価(ローンの引受、請求書の提出、臨床レビュー、事前認可)を行う。現在の評価では、個別の障害モードを混同し、エージェントがそのデプロイメント環境が要求する標準に適合しているかどうかを隠蔽する単一のタスクのスカラーが報告されている。長軸決定動作は4つの直交アライメント軸に分解され,それぞれ独立に測定可能かつフェール可能であることが示唆された: 実測精度(FRP), 推論コヒーレンス(RCS), コンプライアンス再構成(CRR), 校正吸収(CAR)。 CRRは、新しい規制下地軸であり、CARは、カバレッジを精度から分離した測定軸である。本研究は、貸付資格と保険請求を決定論的基盤構造で規定した基準(LongHorizon-Bench)の分解を行う。 6つのメモリアーキテクチャを実行すると、構造集約の正確さは見つからない: 検索は事実精度で崩壊する; スキーマアンコールされたアーキテクチャは足場税を支払う; ファクト保存プロンプトによるプレーンな要約はFRP、RCS、EDA、CRRの強力なベースラインであり、全ての6つのアーキテクチャが全てのケースにコミットし、フィールドが目標としていない決定的アライメント軸を明らかにする。この分解によって、事前に登録された我々の予測が表れ、要約は事実のリコールに失敗し、そのデータが大規模に逆転し、軸レベルの逆アグリゲーション精度が隠された。このアライメント文献では、制度的アライメント(規制的再構築)と決定的アライメント(校正的棄権)が不足しており、その決定が実験室を離れると負荷に耐えられるようになる。フレームワークは、ファクトスキーマを構築し、CRR監査プロンプトを校正する、2つのステップを通じて、規制された決定ドメインに転送する。

論文の概要: Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

関連論文リスト