Fugu-MT 論文翻訳(概要): Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents

論文の概要: Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents

arxiv url: http://arxiv.org/abs/2604.27283v1
Date: Thu, 30 Apr 2026 00:32:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:53.852603
Title: Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
Title（参考訳）: 記憶すべき時間:LLMに基づく符号化エージェントにおける注意深い記憶検索のためのリスク感性コンテキスト帯域
Authors: Mehmet Iscan,
Abstract要約: コーディングエージェントは、以前の経験、トレースの修復、リポジトリローカルな運用知識を再利用するために、ますます外部メモリに依存している。本稿では、純トップk検索問題ではなく、選択的かつリスクに敏感な制御問題として、イシューメモリの使用を再検討する。リスクに敏感なコンテキスト帯域メモリコントローラであるRSCB-MCを導入し,メモリ使用の有無を判断し,トップレゾリューションを注入し,複数の候補を要約し,高精度または高速リコール検索,停止,あるいはフィードバックを求める。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM)-based coding agents increasingly rely on external memory to reuse prior debugging experience, repair traces, and repository-local operational knowledge. However, retrieved memory is useful only when the current failure is genuinely compatible with a previous one; superficial similarity in stack traces, terminal errors, paths, or configuration symptoms can lead to unsafe memory injection. This paper reframes issue-memory use as a selective, risk-sensitive control problem rather than a pure top-k retrieval problem. We introduce RSCB-MC, a risk-sensitive contextual bandit memory controller that decides whether an agent should use no memory, inject the top resolution, summarize multiple candidates, perform high-precision or high-recall retrieval, abstain, or ask for feedback. The system stores reusable issue knowledge through a pattern-variant-episode schema and converts retrieval evidence into a fixed 16-feature contextual state capturing relevance, uncertainty, structural compatibility, feedback history, false-positive risk, latency, and token cost. Its reward design penalizes false-positive memory injection more strongly than missed reuse, making non-injection and abstention first-class safety actions. In deterministic smoke-scale artifacts, RSCB-MC obtains the strongest non-oracle offline replay success rate, 62.5%, while maintaining a 0.0% false-positive rate. In a bounded 200-case hot-path validation, it reaches 60.5% proxy success with 0.0% false positives and a 331.466 microseconds p95 decision latency. The results show that, for coding-agent memory, the key question is not only which memory is most similar, but whether any retrieved memory is safe enough to influence the debugging trajectory.
Abstract（参考訳）: 大規模な言語モデル(LLM)ベースのコーディングエージェントは、デバッグ前のエクスペリエンス、リカバリトレース、リポジトリローカルな運用知識を再利用するために、外部メモリに依存している。しかし、検索されたメモリは、現在の障害が実際に以前の障害と互換性がある場合にのみ有用である。スタックトレース、端末エラー、パス、設定の症状といった表面的類似性は、安全でないメモリインジェクションにつながる可能性がある。本稿では、純トップk検索問題ではなく、選択的かつリスクに敏感な制御問題として、イシューメモリの使用を再検討する。リスクに敏感なコンテキスト帯域メモリコントローラであるRSCB-MCを導入し,メモリ使用の有無を判断し,トップレゾリューションを注入し,複数の候補を要約し,高精度または高速リコール検索,停止,あるいはフィードバックを求める。このシステムは、パターン変動エピソードスキーマを通じて再利用可能な発行知識を格納し、検索証拠を、関連性、不確実性、構造的整合性、フィードバック履歴、偽陽性リスク、レイテンシ、トークンコストをキャプチャする固定された16種類のコンテキスト状態に変換する。その報酬設計は、誤陽性のメモリインジェクションを、再利用の欠如よりも強く罰し、非注入と第一級安全アクションを棄却する。決定論的スモークスケールのアーティファクトでは、RSCB-MCは最強のオフライン再生成功率62.5%を獲得し、偽陽性率0.0%を維持している。 200ケースのホットパス検証では60.5%のプロキシ成功、0.0%の偽陽性、331.466マイクロ秒のp95決定遅延がある。その結果、コーディングエージェントメモリでは、どのメモリが最もよく似ているかだけでなく、検索したメモリがデバッグ路に影響を与えるのに十分な安全かどうかが重要な問題であることがわかった。

論文の概要: Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents

関連論文リスト