Fugu-MT 論文翻訳(概要): Evaluating Long-Term Memory for Long-Context Question Answering

論文の概要: Evaluating Long-Term Memory for Long-Context Question Answering

arxiv url: http://arxiv.org/abs/2510.23730v1
Date: Mon, 27 Oct 2025 18:03:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:36.405658
Title: Evaluating Long-Term Memory for Long-Context Question Answering
Title（参考訳）: 長期質問応答における長期記憶の評価
Authors: Alessandra Terranova, Björn Ross, Alexandra Birch,
Abstract要約: 質問応答タスクにアノテートした合成長文対話のベンチマークであるLoCoMoを用いて,メモリ拡張手法の体系的評価を行う。以上の結果から,メモリ拡張アプローチによりトークン使用率が90%以上削減され,競争精度が向上した。
参考スコア（独自算出の注目度）: 100.1267054069757
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In order for large language models to achieve true conversational continuity and benefit from experiential learning, they need memory. While research has focused on the development of complex memory systems, it remains unclear which types of memory are most effective for long-context conversational tasks. We present a systematic evaluation of memory-augmented methods using LoCoMo, a benchmark of synthetic long-context dialogues annotated for question-answering tasks that require diverse reasoning strategies. We analyse full-context prompting, semantic memory through retrieval-augmented generation and agentic memory, episodic memory through in-context learning, and procedural memory through prompt optimization. Our findings show that memory-augmented approaches reduce token usage by over 90% while maintaining competitive accuracy. Memory architecture complexity should scale with model capability, with small foundation models benefitting most from RAG, and strong instruction-tuned reasoning model gaining from episodic learning through reflections and more complex agentic semantic memory. In particular, episodic memory can help LLMs recognise the limits of their own knowledge.
Abstract（参考訳）: 大きな言語モデルが真の会話の連続性を達成し、経験的学習の恩恵を受けるためには、記憶が必要である。研究は複雑なメモリシステムの開発に焦点を合わせてきたが、どの種類のメモリが長文の会話処理に最も効果的であるかは定かではない。多様な推論戦略を必要とする質問応答タスクに注釈付けされた合成長文対話のベンチマークであるLoCoMoを用いて,メモリ拡張手法の体系的評価を行う。我々は,全文のプロンプト,検索拡張生成とエージェントメモリによるセマンティックメモリ,テキスト内学習によるエピソードメモリ,プロシージャメモリを即時最適化により分析する。以上の結果から,メモリ拡張アプローチによりトークン使用率が90%以上削減され,競争精度が向上した。メモリアーキテクチャの複雑さは、RAGから最も恩恵を受ける小さな基礎モデルと、リフレクションやより複雑なエージェントセマンティックメモリを通じてエピソード学習から得られる強力な命令チューニング推論モデルによって、モデル能力でスケールする必要がある。特に、エピソード記憶は、LLMが自身の知識の限界を認識するのに役立つ。

論文の概要: Evaluating Long-Term Memory for Long-Context Question Answering

関連論文リスト