Fugu-MT 論文翻訳(概要): CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

論文の概要: CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2605.10903v1
Date: Mon, 11 May 2026 17:41:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:51.044641
Title: CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
Title（参考訳）: CapVector:ビジョン・ランゲージ・アクションモデルのためのパラメトリック空間における伝達可能容量ベクトルの学習
Authors: Wenxuan Song, Han Zhao, Fuhao Li, Ziyang Zhou, Xi Wang, Jing Lyu, Pengxiang Ding, Yan Wang, Donglin Wang, Haoang Li,
Abstract要約: 補助的な訓練対象を持つ高度な微調整法は、性能を改善し、収束ステップの数を減らすことができる。本稿では,事前学習したVLAモデルが,標準的な教師付き微調整における性能向上や適応コストの低減に失敗するケースに対して,新しいアプローチを提案する。
参考スコア（独自算出の注目度）: 42.639416481955344
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a novel approach to address the challenge that pretrained VLA models often fail to effectively improve performance and reduce adaptation costs during standard supervised finetuning (SFT). Some advanced finetuning methods with auxiliary training objectives can improve performance and reduce the number of convergence steps. However, they typically incur significant computational overhead due to the additional losses from auxiliary objectives. To simultaneously achieve the enhanced capabilities of auxiliary training with the simplicity of standard SFT, we decouple the two objectives of auxiliary-objective SFT within the parameter space, namely, enhancing general capabilities and fitting task-specific action distributions. To deliver the goal, we only need to train the model to converge on a small-scale task set using two distinct training strategies, resulting in two finetuned models. The parameters' difference between the two models can then be interpreted as capability vectors provided by auxiliary objectives. These vectors are then merged with pretrained parameters to form a capability-enhanced meta model. Moreover, when standard SFT is augmented with a lightweight orthogonal regularization loss, the merged model attains performance comparable to auxiliary finetuned baselines with reduced computational overhead. Internal and external experiments demonstrate that our capability vectors (1) are effective and versatile across diverse models, (2) can generalize to novel environments and embodiments out of the box.
Abstract（参考訳）: 本稿では,事前学習したVLAモデルが,標準教師ありファインタニング(SFT)における性能向上や適応コストの低減に失敗するケースに対して,新たなアプローチを提案する。補助的な訓練対象を持つ高度な微調整法は、性能を改善し、収束ステップの数を減らすことができる。しかし、補助的な目的から余計な損失が生じたため、通常は計算オーバーヘッドが大幅に増大する。標準SFTの簡易化とともに補助訓練の高機能化を実現するため,パラメータ空間内での補助対象SFTの2つの目的,すなわち汎用能力の向上とタスク固有の行動分布の適合を分離する。目標を達成するためには、2つの異なるトレーニング戦略を使用して、小さなタスクセットに収束するようにモデルをトレーニングするだけです。 2つのモデルのパラメータの違いは、補助的な目的によって提供される能力ベクトルとして解釈できる。これらのベクトルは事前訓練されたパラメータとマージされ、機能強化メタモデルを形成する。さらに、標準SFTを軽量な直交正規化損失で拡張した場合、マージモデルは、計算オーバーヘッドを低減した補助的な微調整ベースラインに匹敵する性能を得る。内部および外部実験により,(1)能力ベクトルは多種多様なモデルにまたがって有効で汎用性があり,(2)新しい環境や環境に一般化できることを示した。

論文の概要: CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

関連論文リスト