Fugu-MT 論文翻訳(概要): Memory in Large Language Models: Mechanisms, Evaluation and Evolution

論文の概要: Memory in Large Language Models: Mechanisms, Evaluation and Evolution

arxiv url: http://arxiv.org/abs/2509.18868v1
Date: Tue, 23 Sep 2025 10:06:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-24 20:41:27.809133
Title: Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Title（参考訳）: 大規模言語モデルにおける記憶:メカニズム、評価、進化
Authors: Dianxing Zhang, Wendong Li, Kani Song, Jiaye Lu, Gang Li, Liuchun Yang, Sheng Li,
Abstract要約: 我々は,4つの分類法(パラメトリック,文脈,外部,手続き/エピソード)とメモリ四倍法(ロケーション,永続性,書き込み/アクセスパス,制御性)を提案する。 DMM Gov: DAPT/TAPT, PEFT, モデル編集(ROME, MEND, MEMIT, SERAC)、RAGをコーディネートして監査可能なループを形成する。これにより、再現可能で、同等で、統制可能な、研究と展開のための座標系が得られる。
参考スコア（独自算出の注目度）: 8.158439933515131
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Under a unified operational definition, we define LLM memory as a persistent state written during pretraining, finetuning, or inference that can later be addressed and that stably influences outputs. We propose a four-part taxonomy (parametric, contextual, external, procedural/episodic) and a memory quadruple (location, persistence, write/access path, controllability). We link mechanism, evaluation, and governance via the chain write -> read -> inhibit/update. To avoid distorted comparisons across heterogeneous setups, we adopt a three-setting protocol (parametric only, offline retrieval, online retrieval) that decouples capability from information availability on the same data and timeline. On this basis we build a layered evaluation: parametric (closed-book recall, edit differential, memorization/privacy), contextual (position curves and the mid-sequence drop), external (answer correctness vs snippet attribution/faithfulness), and procedural/episodic (cross-session consistency and timeline replay, E MARS+). The framework integrates temporal governance and leakage auditing (freshness hits, outdated answers, refusal slices) and uncertainty reporting via inter-rater agreement plus paired tests with multiple-comparison correction. For updating and forgetting, we present DMM Gov: coordinating DAPT/TAPT, PEFT, model editing (ROME, MEND, MEMIT, SERAC), and RAG to form an auditable loop covering admission thresholds, rollout, monitoring, rollback, and change audits, with specs for timeliness, conflict handling, and long-horizon consistency. Finally, we give four testable propositions: minimum identifiability; a minimal evaluation card; causally constrained editing with verifiable forgetting; and when retrieval with small-window replay outperforms ultra-long-context reading. This yields a reproducible, comparable, and governable coordinate system for research and deployment.
Abstract（参考訳）: 統一された運用定義の下では、LCMメモリを事前トレーニング、微調整、推論中に記述された永続的な状態として定義する。本稿では,4つの分類法(パラメトリック,コンテキスト,外部,手続き/エポゾディック)とメモリ4倍(ロケーション,永続性,書き込み/アクセスパス,制御性)を提案する。チェーン書き込み -> read -> inhibit/update を通じて、メカニズム、評価、ガバナンスをリンクします。不均一なセットアップ間での歪み比較を避けるため、同一データとタイムライン上の情報の可用性から機能を分離する3つの設定プロトコル(パラメトリックのみ、オフライン検索、オンライン検索)を採用する。パラメトリック(クローズドブックリコール、編集ディファレンシャル、記憶/プライバシ)、コンテキスト(ポジションカーブと中間シーケンスドロップ)、外部(スニペット属性/偽り)、手続き/エピソード(クロスセッション一貫性とタイムラインリプレイ、EMARS+)である。このフレームワークは、時間的ガバナンスとリーク監査(フレッシュネスヒット、時代遅れの回答、拒否されたスライス)と、ラッター間合意による不確実性報告と、多重比較補正を備えたペアテストを統合する。 DAPT/TAPT、PEFT、モデル編集(ROME、MEND、MEMIT、SERAC)、RAGをコーディネートして、許容しきい値、ロールアウト、監視、ロールバック、変更監査をカバーする監査可能なループを形成する。最後に、最小の識別可能性、最小評価カード、検証可能な忘れを伴って因果的に制約された編集、そして小さなウィンドウリプレイによる検索が超長文読解よりも優れた場合の4つの検証可能な命題を提示する。これにより、再現可能で、同等で、統制可能な、研究と展開のための座標系が得られる。

論文の概要: Memory in Large Language Models: Mechanisms, Evaluation and Evolution

関連論文リスト