Fugu-MT 論文翻訳(概要): RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

論文の概要: RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

arxiv url: http://arxiv.org/abs/2603.04639v1
Date: Wed, 04 Mar 2026 21:59:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-06 22:06:10.991199
Title: RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies
Title（参考訳）: RoboMME: ロボットジェネリストポリシーのベンチマークとメモリ理解
Authors: Yinpei Dai, Hongze Fu, Jayjun Lee, Yuejiang Liu, Haoran Zhang, Jianing Yang, Chelsea Finn, Nima Fazeli, Joyce Chai,
Abstract要約: 記憶は、長い水平と歴史に依存したロボット操作にとって重要である。近年,視覚言語アクション(VLA)モデルにメモリ機構が組み込まれ始めている。本稿では,VLAモデルの評価と進展のための大規模標準ベンチマークであるRoboMMEを紹介する。
参考スコア（独自算出の注目度）: 54.23445842621374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluations remain confined to narrow, non-standardized settings. This limits their systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios. Our benchmark comprises 16 manipulation tasks constructed under a carefully designed taxonomy that evaluates temporal, spatial, object, and procedural memory. We further develop a suite of 14 memory-augmented VLA variants built on the π0.5 backbone to systematically explore different memory representations across multiple integration strategies. Experimental results show that the effectiveness of memory representations is highly task-dependent, with each design offering distinct advantages and limitations across different tasks. Videos and code can be found at our website https://robomme.github.io.
Abstract（参考訳）: 記憶は、長い水平と歴史に依存したロボット操作にとって重要である。このようなタスクは、しばしば繰り返しアクションを数えたり、一時的に無視されるオブジェクトを操作する。近年の視覚言語アクション(VLA)モデルではメモリ機構が取り入れられ始めているが、その評価は狭義の非標準設定に限られている。これにより、体系的な理解、比較、進捗測定が制限される。これらの課題に対処するために、長い水平、歴史に依存したシナリオでVLAモデルを評価・改善するための大規模標準ベンチマークであるRoboMMEを紹介した。本ベンチマークでは,時間的・空間的・対象的・手続き的記憶を評価する,慎重に設計された分類の下で構築された16の操作タスクからなる。さらに,π0.5バックボーン上に構築された14種類のメモリ拡張型VLAのスイートを開発し,複数の統合戦略間で異なるメモリ表現を体系的に探索する。実験の結果,メモリ表現の有効性はタスク依存に強く依存しており,各設計は異なるタスクに対して異なる利点と制限を提供することがわかった。ビデオとコードは、私たちのWebサイトhttps://robomme.github.io.comで見ることができる。

論文の概要: RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

関連論文リスト