Fugu-MT 論文翻訳(概要): ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

論文の概要: ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

arxiv url: http://arxiv.org/abs/2510.07151v1
Date: Wed, 08 Oct 2025 15:50:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 16:41:20.602933
Title: ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Title（参考訳）: ELMUR: Long-Horizon RLの更新/リライトによる外部レイヤメモリ
Authors: Egor Cherepanov, Alexey K. Kovalev, Aleksandr I. Panov,
Abstract要約: 本研究では,外部メモリを構造化したトランスアーキテクチャであるEMMURを提案する。 ELMURは、注意窓の向こうに10万倍の有効地平線を拡大する。最大100万歩の廊下を持つ合成T-Mazeタスクで100%の成功率を達成する。
参考スコア（独自算出の注目度）: 48.214881182054164
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer Memory with Update/Rewrite), a transformer architecture with structured external memory. Each layer maintains memory embeddings, interacts with them via bidirectional cross-attention, and updates them through an Least Recently Used (LRU) memory module using replacement or convex blending. ELMUR extends effective horizons up to 100,000 times beyond the attention window and achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps. In POPGym, it outperforms baselines on more than half of the tasks. On MIKASA-Robo sparse-reward manipulation tasks with visual observations, it nearly doubles the performance of strong baselines. These results demonstrate that structured, layer-local external memory offers a simple and scalable approach to decision making under partial observability.
Abstract（参考訳）: 現実世界のロボットエージェントは、部分的な観測可能性と長い地平線の下で行動しなければなりません。しかし、現代のほとんどのアプローチは、過去の洞察を取り入れることなく、瞬間的な情報にのみ依存している。標準的なリカレントモデルやトランスフォーマーモデルは、長期的依存関係の保持と活用に苦慮している。外部メモリを構造化したトランスアーキテクチャであるEMMUR(External Layer Memory with Update/Rewrite)を提案する。各レイヤはメモリの埋め込みを保持し、双方向のクロスアテンションを介して相互作用し、置換または凸ブレンディングを使用してLRU(Last recently Used)メモリモジュールを通じて更新する。 ELMURは、注意窓から最大10万倍の有効地平線を延長し、最大100万歩の廊下を持つ合成T-Mazeタスクで100%の成功率を達成する。 POPGymでは、タスクの半分以上でベースラインを上回ります。 MIKASA-Robo Sparse-Reward Operation Task with visual observedでは、強いベースラインの性能をほぼ2倍に向上させる。これらの結果は、構造化されたレイヤローカルな外部メモリが、部分的な可観測性の下で意思決定にシンプルでスケーラブルなアプローチを提供することを示している。

論文の概要: ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

関連論文リスト