Fugu-MT 論文翻訳(概要): HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

論文の概要: HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

arxiv url: http://arxiv.org/abs/2603.17573v1
Date: Wed, 18 Mar 2026 10:25:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.644894
Title: HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness
Title（参考訳）: HeiSD:キネマティック・アウェアネスを用いた身体的視覚ランゲージ・アクションモデルのためのハイブリッド投機的デコーディング
Authors: Zihao Zheng, Zhihao Mao, Sicheng Tian, Maoliang Li, Jiayu Chen, Xinhao Sun, Zhaobo Zhang, Xuanzhe Liu, Donggang Cao, Hong Mei, Xiang Chen,
Abstract要約: VLA(Vision-Language-Action)モデルはロボット制御の主流のソリューションとなっているが、推論速度が遅い。 VLAモデルにより制御されるロボットの軌道パターンを分析し、重要な洞察を得る。本論文では,HeiSDにおける検索に基づくSD最適化手法を提案する。
参考スコア（独自算出の注目度）: 13.793127202497358
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language-Action (VLA) Models have become the mainstream solution for robot control, but suffer from slow inference speeds. Speculative Decoding (SD) is a promising acceleration method which can be divided into two categories: drafter-based SD and retrieval-based SD. Existing methods fail to analyze the advantages and disadvantages of these two types of SD in VLA models, leading to their sole application or optimization. In this paper, we analyze the trajectory patterns of robots controlled by the VLA model and derive a key insight: the two types of SD should be used in a hybrid manner. However, achieving hybrid SD in VLA models poses several challenges: (1) draft rejection and persistent errors in retrieval-based SD; (2) difficulty in determining the hybrid boundary. To address these, we propose the HeiSD framework. We propose a retrieval-based SD optimization method in HeiSD,which contains a verify-skip mechanism and a sequence-wise relaxed acceptance strategy. Moreover, we proposed a kinematic-based fused metric in HeiSD to automatically determine the hybrid boundary. Experimental results demonstrate that HeiSD attains a speedup of up to 2.45x in simulation benchmarks and 2.06x~2.41x in real-world scenarios, while sustaining a high task success rate.
Abstract（参考訳）: VLA(Vision-Language-Action)モデルはロボット制御の主流のソリューションとなっているが、推論速度が遅い。投機的復号法 (SD) は有望な加速法であり, ドラフト型SDと検索型SDの2つのカテゴリに分けられる。既存の手法では、VLAモデルにおけるこれらの2種類のSDの利点と欠点を解析できないため、唯一の応用や最適化に繋がる。本稿では,VLAモデルによって制御されるロボットの軌道パターンを解析し,この2種類のSDをハイブリッド方式で使用すべきという重要な知見を導出する。しかしながら、VLAモデルにおけるハイブリッドSDの実現には、(1)検索ベースSDにおけるドラフト拒絶と持続的エラー、(2)ハイブリッド境界の決定の難しさなど、いくつかの課題がある。そこで我々はHeiSDフレームワークを提案する。本稿では,HeiSDにおける検索に基づくSD最適化手法を提案する。さらに,ハイブリット境界を自動的に決定するために,HeiSDでキネマティックベースの融合計量を提案した。実験の結果、HeiSDはシミュレーションベンチマークで最大2.45倍、実際のシナリオでは2.06x~2.41倍のスピードアップを実現し、高いタスク成功率を維持した。

論文の概要: HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

関連論文リスト