Fugu-MT 論文翻訳(概要): Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks

論文の概要: Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks

arxiv url: http://arxiv.org/abs/2602.21013v1
Date: Tue, 24 Feb 2026 15:30:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.674105
Title: Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks
Title（参考訳）: Notes-to-Self: メモリ依存操作タスクのためのスクラッチパッド拡張VLA
Authors: Sanjay Haresh, Daniel Dijkman, Apratim Bhattacharyya, Roland Memisevic,
Abstract要約: 言語スクラッチパッドを組み込むことにより,空間記憶と時間記憶の両方を視覚言語アクション(VLA)モデルに付与する方法を示す。本手法は,ClevrSkills環境からのメモリ依存タスクの分割,MemoryBench上でのメモリ依存タスク,そして実世界のピック・アンド・プレイスタスクにおいて評価する。
参考スコア（独自算出の注目度）: 9.55115186979077
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Many dexterous manipulation tasks are non-markovian in nature, yet little attention has been paid to this fact in the recent upsurge of the vision-language-action (VLA) paradigm. Although they are successful in bringing internet-scale semantic understanding to robotics, existing VLAs are primarily "stateless" and struggle with memory-dependent long horizon tasks. In this work, we explore a way to impart both spatial and temporal memory to a VLA by incorporating a language scratchpad. The scratchpad makes it possible to memorize task-specific information, such as object positions, and it allows the model to keep track of a plan and progress towards subgoals within that plan. We evaluate this approach on a split of memory-dependent tasks from the ClevrSkills environment, on MemoryBench, as well as on a challenging real-world pick-and-place task. We show that incorporating a language scratchpad significantly improves generalization on these tasks for both non-recurrent and recurrent models.
Abstract（参考訳）: 多くの厳密な操作タスクは本質的には非マルコフ的であるが、近年の視覚言語アクション(VLA)パラダイムの上昇において、この事実にはほとんど注意が払われていない。彼らはインターネット規模のセマンティック理解をロボティクスに導入することに成功しているが、既存のVLAは主に「ステートレス」であり、メモリ依存の長期水平作業に苦労している。本研究では,言語スクラッチパッドを組み込むことにより,空間記憶と時間記憶の両方をVLAに付与する方法を検討する。スクラッチパッドは、オブジェクト位置などのタスク固有の情報を記憶し、計画の追跡と計画内のサブゴールへの前進を可能にする。本手法は,ClevrSkills環境からのメモリ依存タスクの分割,MemoryBench上のメモリ依存タスク,および実世界のピック・アンド・プレイスタスクにおいて評価する。言語スクラッチパッドを組み込むことで,非繰り返しモデルと繰り返しモデルの両方において,これらのタスクの一般化が大幅に向上することを示す。

関連論文リスト

MemoAct: Atkinson-Shiffrin-Inspired Memory-Augmented Visuomotor Policy for Robotic Manipulation [6.490934654648497]
MemoActは階層的なメモリベースのポリシーであり、特定のボトルネックに対処するために異なるメモリ層を利用する。 MemoActは、既存のMarkovianベースラインと履歴対応ポリシーの両方と比較して、優れたパフォーマンスを実現している。
論文参考訳（メタデータ） (2026-03-19T05:02:43Z)
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies [54.23445842621374]
記憶は、長い水平と歴史に依存したロボット操作にとって重要である。近年,視覚言語アクション(VLA)モデルにメモリ機構が組み込まれ始めている。本稿では,VLAモデルの評価と進展のための大規模標準ベンチマークであるRoboMMEを紹介する。
論文参考訳（メタデータ） (2026-03-04T21:59:32Z)
MEM: Multi-Scale Embodied Memory for Vision Language Action Models [73.3883864595845]
本稿では,マルチスケール・エンボダイドメモリ(MEM)について紹介する。 MEMはビデオベースの短水平メモリをビデオエンコーダで圧縮し、テキストベースの長水平メモリと組み合わせている。 MEMは、キッチンを掃除したり、チーズサンドイッチを焼いたりして、最大15分間のタスクをロボットが実行できるようにする。
論文参考訳（メタデータ） (2026-03-04T00:03:02Z)
Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective [16.541717037293278]
物体レベルの部分観測性の下でのストレステストロボット操作のための非マルコフタスクスイートであるLIBERO-Memを紹介する。短軸と長軸の物体追跡と時間的に順序付けられたサブゴールを組み合わせ、現在のフレームを超えて推論を必要とする。 Embodied-SlotSSMは時間的拡張性のために構築されたスロット中心のVLAフレームワークである。
論文参考訳（メタデータ） (2025-11-14T16:56:01Z)
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation [59.31354761628506]
このようなタスクは本質的にマルコフ的ではないが、主流のVLAモデルはそれを見落としているため、ロボット操作には時間的コンテキストが不可欠である。本稿では,長距離ロボット操作のためのコグニション・メモリ・アクション・フレームワークであるMemoryVLAを提案する。本稿では,3つのロボットを対象とした150以上のシミュレーションと実世界のタスクについて評価する。
論文参考訳（メタデータ） (2025-08-26T17:57:16Z)
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents [49.18498389833308]
本研究では,Habitatシミュレータに長距離エンボディタスクのための新しいベンチマークを導入する。このベンチマークは、持続的なエンゲージメントとコンテキスト認識を必要とする60タスクにわたるメモリベースの機能を評価する。
論文参考訳（メタデータ） (2025-06-18T17:06:28Z)
RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLMは、大規模な言語モデルに一般的な読み書きメモリユニットを装備する新しいフレームワークである。デビッドソンのセマンティクス理論に触発され、三重項の形で知識を抽出し保存する。本フレームワークは,時間に基づく質問応答タスクの処理において,堅牢な性能を示す。
論文参考訳（メタデータ） (2023-05-23T17:53:38Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。