Fugu-MT 論文翻訳(概要): AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

論文の概要: AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

arxiv url: http://arxiv.org/abs/2603.04443v1
Date: Sun, 22 Feb 2026 00:11:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.21866
Title: AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems
Title（参考訳）: AMV-L:長期LLMシステムにおけるテール遅延制御のためのライフサイクル管理エージェントメモリ
Authors: Emmanuel Bamidele,
Abstract要約: 本稿では,エージェントメモリを管理システムリソースとして扱うメモリ管理フレームワークであるAMV-Lを提案する。 AMV-Lはスループットを3.1倍改善し、レイテンシを4.2倍 (median)、4.7倍 (p95)、4.4倍 (p99) 削減する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-running LLM agents require persistent memory to preserve state across interactions, yet most deployed systems manage memory with age-based retention (e.g., TTL). While TTL bounds item lifetime, it does not bound the computational footprint of memory on the request path: as retained items accumulate, retrieval candidate sets and vector similarity scans can grow unpredictably, yielding heavy-tailed latency and unstable throughput. We present AMV-L (Adaptive Memory Value Lifecycle), a memory-management framework that treats agent memory as a managed systems resource. AMV-L assigns each memory item a continuously updated utility score and uses value-driven promotion, demotion, and eviction to maintain lifecycle tiers; retrieval is restricted to a bounded, tier-aware candidate set that decouples the request-path working set from total retained memory. We implement AMV-L in a full-stack LLM serving system and evaluate it under identical long-running workloads against two baselines: TTL and an LRU working-set policy, with fixed prompt-injection caps. Relative to TTL, AMV-L improves throughput by 3.1x and reduces latency by 4.2x (median), 4.7x (p95), and 4.4x (p99), while reducing the fraction of requests exceeding 2s from 13.8% to 0.007%. Compared to LRU, AMV-L trades a small regression in median/p95 latency (+26% / +3%) for improved extreme-tail behavior (-15% p99; -98% >2s) and lower token overhead (approximately 6% fewer tokens/request), while matching retrieval quality (value means within approximately 0-2%). The gains arise primarily from bounding retrieval-set size and vector-search work, not from shortening prompts. Our results show that predictable performance for long-running LLM agents requires explicit control of memory working-set size and value-driven lifecycle management, rather than retention time alone.
Abstract（参考訳）: 長期にわたるLLMエージェントは、相互作用間の状態を維持するために永続的なメモリを必要とするが、ほとんどのデプロイされたシステムは、年齢ベースの保持(TTLなど)でメモリを管理する。 TTLはアイテムの寿命を制限しているが、メモリの計算フットプリントをリクエストパスに制限しない: 保持されたアイテムが蓄積されるにつれて、検索候補セットとベクトル類似度スキャンは予測不能に成長し、重み付きレイテンシと不安定なスループットをもたらす。本稿では,エージェントメモリを管理システムリソースとして扱うメモリ管理フレームワークであるAMV-L(Adaptive Memory Value Lifecycle)を提案する。 AMV-Lは、各メモリアイテムに継続的に更新されたユーティリティスコアを割り当て、ライフサイクル階層を維持するために価値駆動型プロモーション、デモーション、エビクションを使用する。我々は,フルスタックのLLMサービスシステムにAMV-Lを実装し,TTLとLRUのワークセットポリシの2つのベースラインに対して,同一の長時間実行負荷で評価する。 TTLとは対照的に、AMV-Lはスループットを3.1倍改善し、レイテンシを4.2倍 (median)、4.7倍 (p95)、4.4倍 (p99) 削減する。 LRUと比較して、AMV-Lは極端テール動作の改善(-15% p99; -98% > 2s)と低いトークンオーバーヘッド(約6%のトークン/要求)のために中央値/p95レイテンシ(+26% / +3%)の小さなレグレッションを、検索品質(値はおよそ0-2%)と引き換えに交換する。ゲインは主に検索セットのサイズとベクトル探索の作業の境界から発生し、プロンプトの短縮によるものではない。以上の結果から,LLMエージェントの長期動作には,保持時間のみではなく,メモリワークセットサイズとバリュー駆動型ライフサイクル管理の明示的な制御が必要であることが示唆された。

論文の概要: AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

関連論文リスト