Fugu-MT 論文翻訳(概要): Episodic-Semantic Memory Architecture for Long-Horizon Scientific Agents

論文の概要: Episodic-Semantic Memory Architecture for Long-Horizon Scientific Agents

arxiv url: http://arxiv.org/abs/2605.17625v1
Date: Sun, 17 May 2026 19:44:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:48.234459
Title: Episodic-Semantic Memory Architecture for Long-Horizon Scientific Agents
Title（参考訳）: 長期科学的エージェントのためのエピソード・セマンティック記憶構造
Authors: Nikola Milosevic,
Abstract要約: 長期統合知識(約3トークン/メッセージ)から即時エピソードニーズ(インスタント10メッセージウィンドウ)を分離するデュアルプロセス記憶アーキテクチャを評価する。より少ないトークン(45,434対12000以上の制限)で1-2秒のレイテンシで70～85%の精度を維持している。連続記憶を保ちながら現実的な線形成長(約3トークン/メッセージ)を示す「最小から最小の」ギャップを同定する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: As Large Language Models (LLMs) evolve into persistent scientific collaborators, context window saturation has emerged as a critical bottleneck. Scientific workflows involving iterative data analysis and hypothesis refinement rapidly saturate even extended contexts with dense technical content, while monolithic approaches suffer from quadratic cost scaling and cognitive degradation. We evaluate a Dual Process Memory Architecture that decouples immediate episodic needs (constant 10-message window) from long-term consolidated knowledge (growing at approximately 3 tokens/message). Unlike prior social agent memory systems, our domain-specific consolidation addresses contradictory parameter evolution, multi-hop reasoning across experimental phases, and precise technical fact retention. Through large-scale evaluation spanning 15,000 messages with cross-model validation across six LLMs from three families (OpenAI, Anthropic, Google), totaling 1,440 queries, we establish three key findings. First, while full-context models fail at 10,000 messages due to context overflow, our system maintains 70-85% accuracy with 1-2 second latency using 62% fewer tokens (45,434 vs 120,000+ limit). Second, cross-model validation reveals architecture-level trade-offs independent of specific LLMs: Dual Process excels at numeric/temporal queries (65-90% accuracy) while RAG excels at historical retrieval (60-85%), suggesting complementary deployment strategies. Third, we identify a "Sim-to-Real" gap where synthetic tests maintain constant memory but realistic workflows exhibit linear growth (about 3 tokens/message), with consolidation quality emerging as the primary scalability bottleneck. The architecture successfully manages profiles with 14,000+ scientific facts (125k tokens), demonstrating that domain-specific memory consolidation enables sustained operation beyond full-context limits.
Abstract（参考訳）: 大規模言語モデル(LLM)が永続的な科学的協力者へと進化するにつれ、コンテキストウィンドウ飽和が重要なボトルネックとして現れてきた。反復的なデータ分析と仮説の洗練を含む科学的ワークフローは、拡張されたコンテキストを高密度な技術コンテンツで急速に飽和させ、モノリシックなアプローチは二次的なコストスケーリングと認知的劣化に悩まされる。我々は,長期統合知識(約3トークン/メッセージ)から即時エピソードニーズ(インスタント10メッセージウィンドウ)を分離するデュアルプロセスメモリアーキテクチャを評価した。従来のソーシャルエージェントメモリシステムとは異なり、ドメイン固有の統合は、相反するパラメータ進化、実験段階にわたるマルチホップ推論、そして正確な技術的事実保持に対処する。 3つのファミリー(OpenAI, Anthropic, Google)から6つのLSMにまたがって、15,000のメッセージにクロスモデル検証を施した大規模な評価を行うことで、合計1,440のクエリで、3つの重要な結果が得られた。まず、コンテキストオーバーフローによって1万のメッセージでフルコンテキストモデルがフェールするのに対して、私たちのシステムは、トークンを62%削減(45,434対120,000以上の制限)して、1-2秒のレイテンシで70～85%の精度を維持しています。次に、クロスモデル検証では、特定のLLMに依存しないアーキテクチャレベルのトレードオフを明らかにしている。デュアルプロセスは数値/時間クエリ(65～90%の精度)で、RAGは履歴検索(60～85%)で、補完的なデプロイメント戦略を提案する。第三に、合成テストが一定のメモリを維持するが、現実的なワークフローは線形な成長(約3トークン/メッセージ)を示し、統合品質が主要なスケーラビリティのボトルネックとして出現する"最小から最小の"ギャップを特定します。このアーキテクチャは、14,000以上の科学的事実(125kトークン)を持つプロファイルをうまく管理し、ドメイン固有のメモリ統合が完全なコンテキスト制限を超えて持続的な操作を可能にすることを示した。

論文の概要: Episodic-Semantic Memory Architecture for Long-Horizon Scientific Agents

関連論文リスト