Fugu-MT 論文翻訳(概要): PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models

論文の概要: PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2605.10925v1
Date: Mon, 11 May 2026 17:56:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:51.058281
Title: PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
Title（参考訳）: PriorVLA:ビジョン・ランゲージ・アクションモデルのための事前保存適応
Authors: Xinyu Guo, Bin Xie, Wei Chai, Xianchi Deng, Tiancai Wang, Zhengxing Wu, Xingyu Chen,
Abstract要約: 事前学習した事前学習を効果的に活用するためのフレームワークである PreVLA を提案する。 1タスクにつき10回のデモで、PreferVLAはIDが48%、OODが32%、pi0.5が24ポイント、OODが22ポイントを超えた。
参考スコア（独自算出の注目度）: 45.541651600761924
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale pretraining has made Vision-Language-Action (VLA) models promising foundations for generalist robot manipulation, yet adapting them to downstream tasks remains necessary. However, the common practice of full fine-tuning treats pretraining as initialization and can shift broad priors toward narrow training-distribution patterns. We propose PriorVLA, a novel framework that preserves pretrained priors and learns to leverage them for effective adaptation. PriorVLA keeps a frozen Prior Expert as a read-only prior source and trains an Adaptation Expert for downstream specialization. Expert Queries capture scene priors from the pretrained VLM and motor priors from the Prior Expert, integrating both into the Adaptation Expert to guide adaptation. Together, PriorVLA updates only 25% of the parameters updated by full fine-tuning. Across RoboTwin 2.0, LIBERO, and real-world tasks, PriorVLA achieves stronger overall performance than full fine-tuning and state-of-the-art VLA baselines, with the largest gains under out-of-distribution (OOD) and few-shot settings. PriorVLA improves over pi0.5 by 11 points on RoboTwin 2.0-Hard and achieves 99.1% average success on LIBERO. Across eight real-world tasks and two embodiments, PriorVLA reaches 81% in-distribution (ID) and 57% OOD success with standard data. With only 10 demonstrations per task, PriorVLA reaches 48% ID and 32% OOD success, surpassing pi0.5 by 24 and 22 points, respectively.
Abstract（参考訳）: 大規模な事前トレーニングにより、VLA(Vision-Language-Action)モデルは汎用的なロボット操作の基礎を約束しているが、下流のタスクに適応する必要がある。しかし、フル微調整の一般的な実践は、初期化として事前訓練を行い、より広い事前訓練を狭義の訓練分布パターンにシフトさせることができる。我々は、事前学習した事前学習を保存し、それらを効果的な適応に活用することを学ぶための新しいフレームワークであるPreferVLAを提案する。 PriorVLAはフリーズされたPreside Expertをリードオンリーの事前ソースとして保持し、下流の特殊化のためにAdaptation Expertを訓練する。エキスパートクエリは、事前訓練されたVLMのシーンと、事前エキスパートのモーター前のシーンをキャプチャし、Adaptation Expertと統合して、適応をガイドする。 PriorVLAは同時に、完全な微調整によって更新されたパラメータの25%だけを更新する。 RoboTwin 2.0、LIBERO、および現実世界のタスク全体で、PreferVLAは完全な微調整と最先端のVLAベースラインよりも全体的なパフォーマンスが向上し、アウト・オブ・ディストリビューション(OOD)と数ショット設定で最大の利益を得ている。 PreVLAはRoboTwin 2.0-Hardでpi0.5よりも11ポイント向上し、LIBEROで平均99.1%の成功を達成した。現実世界の8つのタスクと2つの実施状況の中で、PreferVLAは標準データで81%の分散(ID)と57%のOOD成功を達成した。 1タスクにつき10回のデモで、PreferVLAはIDが48%、OODが32%、pi0.5が24ポイント、OODが22ポイントを超えた。

論文の概要: PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models

関連論文リスト