Fugu-MT 論文翻訳(概要): MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents

論文の概要: MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents

arxiv url: http://arxiv.org/abs/2601.20831v1
Date: Wed, 28 Jan 2026 18:31:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-29 15:46:07.094323
Title: MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents
Title（参考訳）: MemCtrl: エージェントのアクティブメモリコントローラとしてMLLMを使用する
Authors: Vishnu Sashank Dorbala, Dinesh Manocha,
Abstract要約: 本稿では,Multimodal Large Language Models (MLLM) を用いたメモリのオンラインプルーニングフレームワークであるMemCtrlを提案する。拡張MLLMは平均で16%、特定の命令サブセットで20%以上改善されている。
参考スコア（独自算出の注目度）: 53.44122827359892
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models rely on in-context learning for personalized decision making. The limited size of this context window necessitates memory compression and retrieval systems like RAG. These systems however often treat memory as large offline storage spaces, which is unfavorable for embodied agents that are expected to operate under strict memory and compute constraints, online. In this work, we propose MemCtrl, a novel framework that uses Multimodal Large Language Models (MLLMs) for pruning memory online. MemCtrl augments MLLMs with a trainable memory head μthat acts as a gate to determine which observations or reflections to retain, update, or discard during exploration. We evaluate with training two types of μ, 1) via an offline expert, and 2) via online RL, and observe significant improvement in overall embodied task completion ability on μ-augmented MLLMs. In particular, on augmenting two low performing MLLMs with MemCtrl on multiple subsets of the EmbodiedBench benchmark, we observe that μ-augmented MLLMs show an improvement of around 16% on average, with over 20% on specific instruction subsets. Finally, we present a qualitative analysis on the memory fragments collected by μ, noting the superior performance of μaugmented MLLMs on long and complex instruction types.
Abstract（参考訳）: ファンデーションモデルは、パーソナライズされた意思決定のために、コンテキスト内学習に依存している。このコンテキストウィンドウのサイズ制限は、RAGのようなメモリ圧縮と検索システムを必要とする。しかし、これらのシステムはメモリを大規模なオフラインストレージ空間として扱うことが多く、これは厳格なメモリと計算制約の下で動作することが期待されるエンボディエージェントにとって、オンラインでは好ましくない。本研究では,Multimodal Large Language Models (MLLM) を用いたメモリのオンラインプルーニングのための新しいフレームワークであるMemCtrlを提案する。 MemCtrlは、訓練可能なメモリヘッドμでMLLMを拡張し、探索中にどの観測や反射を保持、更新、破棄するかを決定するゲートとして機能する。 2種類のμをトレーニングして評価する。 1) オフラインの専門家による。 2) オンラインRLを用いて, μ添加MLLM上での総合的な実施作業完了能力の大幅な改善を観察した。特に、EmbodiedBenchベンチマークの複数のサブセットにMemCtrlで2つの低パフォーマンスMLLMを増設すると、μ増倍MLLMは平均で16%、特定の命令サブセットで20%以上改善されていることが分かる。最後に,μ で収集したメモリフラグメントの質的解析を行い,μaugmented MLLM の長い命令型および複雑な命令型における優れた性能について述べる。

論文の概要: MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents

関連論文リスト