Fugu-MT 論文翻訳(概要): VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

論文の概要: VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2605.29562v1
Date: Thu, 28 May 2026 08:14:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 00:00:30.951252
Title: VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
Title（参考訳）: VLA-Pro:ビジョン・ランゲージ・アクションモデルのためのクロスタスク手続き型メモリ転送
Authors: Shengyu Si, Yuanzhuo Lu, Ruimeng Yang, Ziyi Ye, Zuxuan Wu, Yu-Gang Jiang,
Abstract要約: VLA-Proは、クロスタスクの一般化を強化するために設計されたプラグイン・アンド・プレイのフレームワークである。タスク関連手続き記憶をトレーニング時に保存し、推論中にこれらの記憶を転送する。 RoboTwin、RLBench、および実世界の操作タスクの実験は、VLA-Proがクロスタスクの一般化を一貫して改善していることを示している。
参考スコア（独自算出の注目度）: 73.99344788183949
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language-Action~(VLA) models have shown strong potential for general-purpose robotic manipulation, yet they still struggle to generalize to unseen tasks that necessitate transferring relevant experience across objects, scenes, and action patterns. This paper proposes VLA-Pro, a plug-and-play framework designed to enhance cross-task generalization by storing task-relevant procedural memories at training time and transferring these memories during inference. Specifically, VLA-Pro stores task-specific LoRA adapters as parameterized procedural memories during training. At inference time, VLA-Pro retrieves relevant procedural memories based on the current multi-modal context and dynamically fuses these memories for generating the current action chunk. Experiments on RoboTwin, RLBench, and real-world manipulation tasks show that VLA-Pro consistently improves cross-task generalization across multiple backbones, achieving up to a 207% relative improvement in simulation and increasing real-world success rate from 5.8% to 65.0%. These results suggest that procedural memory retrieval and adaptation provide an effective mechanism for transferring manipulation experience to novel tasks while preserving modularity and execution stability.
Abstract（参考訳）: Vision-Language-Action〜(VLA)モデルは、汎用的なロボット操作の強力な可能性を示しているが、オブジェクト、シーン、アクションパターン間で関連するエクスペリエンスを移行する必要のある、見知らぬタスクへの一般化に苦慮している。本稿では,タスク関連プロシージャメモリをトレーニング時に格納し,これらのメモリを推論中に転送することで,タスク間の一般化を促進するためのプラグイン・アンド・プレイフレームワークであるVLA-Proを提案する。具体的には、VLA-Proは、訓練中にタスク固有のLoRAアダプタをパラメータ化された手続き記憶として保存する。推論時に、VLA-Proは、現在のマルチモーダルコンテキストに基づいて関連する手続き記憶を検索し、現在のアクションチャンクを生成するためにこれらの記憶を動的に融合する。 RoboTwin、RLBench、および実世界の操作タスクの実験により、VLA-Proは複数のバックボーンにわたるクロスタスクの一般化を一貫して改善し、シミュレーションにおける207%の相対的な改善を達成し、実世界の成功率を5.8%から65.0%に向上させた。これらの結果から,手続き的メモリ検索と適応は,モジュール性と実行安定性を保ちつつ,操作経験を新しいタスクに伝達する有効なメカニズムを提供すると考えられる。

論文の概要: VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

関連論文リスト