Fugu-MT 論文翻訳(概要): EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models

論文の概要: EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2605.25477v1
Date: Mon, 25 May 2026 06:31:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.352648
Title: EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models
Title（参考訳）: EXPO-FT:ビジョン・ランゲージ・アクションモデルのためのサンプル効率の良い強化学習ファインタニング
Authors: Perry Dong, Kuo-Han Hung, Tian Gao, Dorsa Sadigh, Chelsea Finn,
Abstract要約: 提案するEXPO-FTは,事前学習したVLAポリシーの安定かつサンプル効率の良いRL微調整システムである。本システムは,オンラインロボットデータの平均19.1分以内の全ての評価課題に対して,完全なタスク性能(30/30の成功)を実現する。我々は、ロボット工学におけるVLAモデルのより広範なRLファインタニング導入を促進することを目的とした、オープンソースのロバスト性をリリースする。
参考スコア（独自算出の注目度）: 84.73890225707264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to efficiently and reliably learn new tasks has been a foundational challenge in robotics. Vision-Language-Action (VLA) models have demonstrated strong generalization across diverse manipulation tasks, yet pretrained policies consistently fall short of the reliability required for real-world deployment. Reinforcement learning (RL) fine-tuning offers a promising path to bridge this gap, but existing approaches either train from scratch without fully leveraging pretrained priors, or fine-tune VLAs without achieving the sample efficiency and success rates that practical deployment demands. We present EXPO-FT, a system for stable, sample-efficient RL finetuning of pretrained VLA policies that closes this gap. Our system solves a suite of challenging manipulation tasks, including routing string lights and inserting the plug to light it up, striking a pool ball into a pocket, and inserting a flower into a wine bottle, each requiring combinations of high precision, dynamic actions, and robustness to varied initial states. Our system achieves perfect task performance (30/30 successes) across all evaluated tasks within an average of 19.1 minutes of online robot data, outperforming both prior RL-from-scratch and VLA finetuning approaches. We release an open-source codebase with the aim of facilitating broader adoption of RL finetuning of VLA models in robotics.
Abstract（参考訳）: 新しいタスクを効率的かつ確実に学習する能力は、ロボティクスにおける基礎的な課題である。 VLA(Vision-Language-Action)モデルは、様々な操作タスクにまたがる強力な一般化を実証しているが、事前訓練されたポリシーは、現実世界のデプロイメントに必要な信頼性を欠いている。強化学習(RL)の微調整は、このギャップを埋めるための有望な道を提供するが、既存のアプローチでは、事前トレーニングを十分に活用することなく、ゼロからトレーニングするか、実際のデプロイメント要求のサンプル効率と成功率を達成することなく、細調整のVLAを使用する。我々は,このギャップを埋める事前訓練されたVLAポリシーの,安定かつ試料効率の良いRL微調整システムであるEXPO-FTを提案する。本システムでは、弦の点灯をルーティングし、プラグを差し込んで点灯し、プールボールをポケットに打ち込み、ワインボトルに花を挿入するなど、様々な操作課題を解決する。本システムは,オンラインロボットデータの平均19.1分間において,評価されたすべてのタスクに対して,完全なタスク性能(30/30の成功)を達成し,従来のRL-from-scratchとVLAファインタニングの両手法より優れていた。我々は,ロボット工学におけるVLAモデルのRLファインタニングの広範な採用を促進することを目的とした,オープンソースのコードベースをリリースする。

論文の概要: EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models

関連論文リスト