Fugu-MT 論文翻訳(概要): What Frozen VLAs Already Know About Success: A Probing Study of Value-Like Structure in Foundation Robot Policies

論文の概要: What Frozen VLAs Already Know About Success: A Probing Study of Value-Like Structure in Foundation Robot Policies

arxiv url: http://arxiv.org/abs/2605.28527v1
Date: Wed, 27 May 2026 14:23:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:56.110361
Title: What Frozen VLAs Already Know About Success: A Probing Study of Value-Like Structure in Foundation Robot Policies
Title（参考訳）: フリーズンVLAがすでに成功について知っていること:ファンデーションロボット政策におけるバリューライクな構造の研究
Authors: Jiachen Zhang, Junnan Nie, Junyi Lao, Wei Cheng, Chenghao Liu, Jiaxin Jiang, Songfang Huang,
Abstract要約: ビジョン・ランゲージ・アクション(VLA)ポリシーは、アクションを模倣するように訓練されている。凍結特性の軽量線形プローブを用いてモンテカルロの結果目標を復元する。ゲインは普遍的ではなく、追加の推論計算を必要とするが、基礎となる発見はクリーンである。
参考スコア（独自算出の注目度）: 36.91260665881213
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision--language--action (VLA) policies are trained to imitate actions; their loss never asks them to estimate reward, progress, or future success. Their frozen representations nevertheless carry such information, and it can be read out and used to guide action choice without retraining the policy. From mixed successful and failed manipulation trajectories on LIBERO-Goal, we recover Monte-Carlo outcome targets using lightweight linear probes on frozen features. The targets are consistently predictable from OpenVLA, Pi0.5, DINOv2, and CLIP features, and substantially less so from baselines built on progress, time-to-go, task identity, or proprioception. To rule out task and temporal shortcuts, we evaluate the probes under same-task, same-timestep matched comparisons: Pi0.5 probes still reach roughly 92% pairwise ordering accuracy, while label-shuffled controls stay at chance. Used as a test-time selector over sampled Pi0.5 action prefixes, the same probe turns this offline finding into behavior: on push-plate, success rises from 26.7% under greedy decoding to 44.3%, with a second positive case on wine-rack. The gains are not universal and require additional inference compute, but the underlying finding is clean: frozen VLAs already encode information about success that their imitation objective never explicitly demands.
Abstract（参考訳）: ビジョン・ランゲージ・アクション(VLA)ポリシーは、アクションを模倣するように訓練されている。それにもかかわらず、凍結した表現はそのような情報を持ち、それを読み出して、ポリシーを再訓練することなく行動選択を導くのに使うことができる。 LIBERO-Goal上での操作軌道の混合, 故障から, 凍結特性の軽量線形プローブを用いてモンテカルロの結果目標を回収する。ターゲットはOpenVLA、Pi0.5、DINOv2、CLIPの機能から常に予測可能であり、プログレッシブ、タイム・トゥ・ゴー、タスク・アイデンティティ、プロプライエセプションといったベースラインからはかなり少ない。タスクと時間的ショートカットを除外するために、同じタスク、同じタイムステップで比較したプローブを評価する。 Pi0.5プローブは、ほぼ92%のペアオーダー精度を保ちながら、ラベルシャッフル制御は偶然に残る。サンプル化されたPi0.5アクションプレフィックスの試験時間セレクタとして使用され、同じプローブがオフラインでこの発見を振舞う:プッシュプレートでは、成功率が26.7%から44.3%に上昇し、ワインラックでは第2の正のケースである。凍結されたVLAはすでに成功に関する情報を符号化しており、模倣の目的が明示的に要求されることは決してない。

論文の概要: What Frozen VLAs Already Know About Success: A Probing Study of Value-Like Structure in Foundation Robot Policies

関連論文リスト