Fugu-MT 論文翻訳(概要): Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

論文の概要: Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

arxiv url: http://arxiv.org/abs/2605.09863v1
Date: Mon, 11 May 2026 01:49:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.461882
Title: Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents
Title（参考訳）: Nautilus Compass:生産用LLMエージェントのブラックボックスペルソナドリフト検出
Authors: Chunxiao Wang,
Abstract要約: ナチラス・コンパス(Nautilus Compass)は、ブラックボックスドリフト検出器と、生産用コーディングエージェントのためのエージェントメモリ層である。このシステムは、Claude Codeプラグイン、CP 20245 A2Aサーバ、CLI、APIを1つのデーモンで提供する。
参考スコア（独自算出の注目度）: 2.417342411475111
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Production LLM coding agents drift over long sessions: they forget user-specified constraints, slip into mistakes the user already flagged, and confabulate prior agreements. White-box approaches such as persona vectors require model weights and so cannot be applied to closed APIs (Claude, GPT-4) that most users actually interact with. We present Nautilus Compass, a black-box persona drift detector and agent memory layer for production coding agents. The method operates entirely at the prompt-text layer: cosine similarity between user prompts and behavioral anchor texts, aggregated by a weighted top-k mean using BGE-m3 embeddings. Compass is, to our knowledge, the only public agent memory layer (among Mem0, Letta, Cognee, Zep, MemOS, smrti verified May 2026) that does not call an LLM at index time to extract facts or build a graph; raw conversation text is embedded directly. The system ships as a Claude Code plugin, an MCP 2024-11-05 A2A server (Cursor, Cline, Hermes), a CLI, and a REST API on one daemon, with a Merkle-chained audit log for tamper-evident anchor updates. On a held-out test set built from real Claude Code session traces and labeled by an independent LLM judge, Compass reaches ROC AUC 0.83 for drift detection. The embedded retrieval pipeline scores 56.6% on LongMemEval-S v0.8 and 44.4% on EverMemBench-Dynamic (n=500), topping the four published EverMemBench Table 4 baselines. LongMemEval-S 56.6% is ~30 points below recent white-box leaders (90+%); we treat that as the architectural ceiling of the no-extraction design. End-to-end reproduction cost is $3.50 (~14x cheaper than GPT-4o-judged stacks). A paired cross-vendor behavior A/B accompanies these numbers as preliminary system-level evidence. Code, anchors, frozen test data, and audit-log tooling are MIT-licensed at github.com/chunxiaoxx/nautilus-compass.
Abstract（参考訳）: LLMのコーディングエージェントは、ユーザが指定した制約を忘れて、ユーザがすでにフラグ付けしているミスに陥り、事前の合意を伝達する。ペルソナベクトルのようなホワイトボックスアプローチはモデルウェイトを必要とするため、ほとんどのユーザが実際に対話するクローズドAPI(Claude, GPT-4)には適用できない。我々は,ブラックボックスのペルソナドリフト検出器とエージェントメモリ層であるNautilus Compassを紹介した。この方法は、ユーザプロンプトと行動アンカーテキストのコサイン類似性をBGE-m3埋め込みを用いて重み付けされたトップk平均で集約するプロンプトテキスト層で完全に動作する。 Compassは、我々の知る限り、唯一の公開エージェントメモリ層(Mem0, Letta, Cognee, Zep, MemOS, smrti, smrti)である。システムは、Claude Codeプラグイン、CP 2024-11-05 A2Aサーバ(Cursor、Cline、Hermes)、CLI、REST APIを1つのデーモンに配置し、Merkle-chained audit logで、アンカー更新を改ざんする。実際のClaude Codeセッショントレースから構築され、独立したLCM判事によってラベル付けされたホールドアウトテストセットでは、コンパスはドリフト検出のためにROC AUC 0.83に達する。組込み検索パイプラインはLongMemEval-S v0.8で56.6%、EverMemBench-Dynamic(n=500)で44.4%を獲得し、4つのEverMemBench Table 4ベースラインを抜いた。 LongMemEval-S 56.6%は最近のホワイトボックスリーダー(90%以上)の30ポイント以下である。エンドツーエンドの再生コストは350ドル(GPT-4o-judgedスタックの約14倍)である。対のクロスベンダーの振る舞い A/B は、これらの数値を予備的なシステムレベルの証拠として伴っている。コード、アンカー、凍結テストデータ、監査ログツールは、github.com/chunxiaoxx/nautilus-compassでMITライセンスされている。

論文の概要: Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

関連論文リスト