Fugu-MT 論文翻訳(概要): MemFail: Stress-Testing Failure Modes of LLM Memory Systems

論文の概要: MemFail: Stress-Testing Failure Modes of LLM Memory Systems

arxiv url: http://arxiv.org/abs/2605.26667v1
Date: Tue, 26 May 2026 08:03:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.741643
Title: MemFail: Stress-Testing Failure Modes of LLM Memory Systems
Title（参考訳）: MemFail: LLMメモリシステムのストレステスト障害モード
Authors: Ishir Garg, Neel Kolhe, Dawn Song, Xuandong Zhao,
Abstract要約: 大規模言語モデル(LLM)エージェントは、長期にわたる相互作用において一貫性を保つために、外部メモリシステムに依存している。既存のベンチマークでは、集計された質問回答の精度を報告し、メモリシステムをブラックボックスとして扱う。本稿では,現代のLCMメモリシステムの障害モードを分離する診断ベンチマークであるMemFailを紹介する。
参考スコア（独自算出の注目度）: 69.80981631587501
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) agents increasingly rely on external memory systems to remain consistent across long-horizon interactions, but little empirical work has been done to understand the specific failure modes and design choices that these systems present. Existing benchmarks report aggregate question-answering accuracy and treat memory systems as black boxes, making it impossible to attribute an incorrect answer to a particular failure mode of the system. We introduce MemFail, a diagnostic benchmark that isolates the failure modes of modern LLM memory systems. We begin by formalizing memory systems as the composition of three canonical operations -- summarization, storage, and retrieval -- and identify the potential failure modes induced by each. Based on these hypothesized failure modes, we construct five datasets spanning four tasks, each adversarially designed to test a specific operation of a memory system. Using these datasets, we evaluate four state-of-the-art memory systems on MemFail and demonstrate how MemFail can be used to empirically understand the tradeoffs induced by differences in memory system architectures.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントは、長期にわたる相互作用に一貫性を保つために、外部メモリシステムに依存していることが多いが、これらのシステムが持つ特定の障害モードと設計選択を理解するための実証的な作業はほとんど行われていない。既存のベンチマークでは、集計された問合せ精度を報告し、メモリシステムをブラックボックスとして扱うため、システムの特定の障害モードに対する誤った回答を判断することは不可能である。本稿では,現代のLCMメモリシステムの障害モードを分離する診断ベンチマークであるMemFailを紹介する。まず、メモリシステムを3つの標準演算(要約、記憶、検索)の合成として形式化し、それぞれが引き起こす潜在的な障害モードを特定します。これらの仮定された障害モードに基づいて、4つのタスクにまたがる5つのデータセットを構築し、それぞれがメモリシステムの特定の操作をテストするように設計されている。これらのデータセットを用いて、MemFail上の4つの最先端メモリシステムを評価し、メモリシステムアーキテクチャの違いによって引き起こされるトレードオフを実証的に理解するために、MemFailがどのように使用できるかを実証する。

論文の概要: MemFail: Stress-Testing Failure Modes of LLM Memory Systems

関連論文リスト