Fugu-MT 論文翻訳(概要): Habilis-$β$: A Fast-Motion and Long-Lasting On-Device Vision-Language-Action Model

論文の概要: Habilis-$β$: A Fast-Motion and Long-Lasting On-Device Vision-Language-Action Model

arxiv url: http://arxiv.org/abs/2602.18813v1
Date: Sat, 21 Feb 2026 12:15:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-24 17:42:02.329482
Title: Habilis-$β$: A Fast-Motion and Long-Lasting On-Device Vision-Language-Action Model
Title（参考訳）: Habilis-$β$: 高速かつ長時間のオンデバイスビジョン・ランゲージ・アクションモデル
Authors: Tommoro Robotics, :, Jesoon Kang, Taegeon Park, Jisu An, Soo Min Kimm, Jaejoon Kim, Jinu Pahk, Byungju Kim, Junseok Lee, Namheon Baek, Sungwan Ha, Hojun Baek, Eduardo Ayerve Cruz, Wontae Kim, Junghyeon Choi, Yousuk Lee, Joonmo Han, Sunghyun Cho, Sunghyun Kwon, Soyoung Lee, Jun Ki Lee, Seung-Joon Yi, Byoung-Tak Zhang, Theo Taeyeong Kim,
Abstract要約: Habilis-$は、現実のデプロイメント用に設計された、デバイス上での高速かつ長期間のビジョン言語アクション(VLA)モデルである。本稿では,Productivity-Reliability Plane (PRP)を導入し,時間ごとのタスク(TPH)とMTBI(Mean Time Between Intervention)を連続実行プロトコルで評価する。 1時間の連続実行評価では、Habilis-$は、シミュレーションと実環境の両方で$_0.5$に対して、RPPメトリクスで強いパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 24.23805196139948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Habilis-$β$, a fast-motion and long-lasting on-device vision-language-action (VLA) model designed for real-world deployment. Current VLA evaluation remains largely confined to single-trial success rates under curated resets, which fails to capture the fast-motion and long-lasting capabilities essential for practical operation. To address this, we introduce the Productivity-Reliability Plane (PRP), which evaluates performance through Tasks per Hour (TPH) and Mean Time Between Intervention (MTBI) under a continuous-run protocol that demands both high-speed execution and sustained robustness. Habilis-$β$ achieves high performance by integrating language-free pre-training on large-scale play data for robust interaction priors with post-training on cyclic task demonstrations that capture state drift across consecutive task iterations. The system further employs ESPADA for phase-adaptive motion shaping to accelerate free-space transit, utilizes rectified-flow distillation to enable high-frequency control on edge devices, and incorporates classifier-free guidance (CFG) as a deployment-time knob to dynamically balance instruction adherence and learned interaction priors. In 1-hour continuous-run evaluations, Habilis-$β$ achieves strong performance under the PRP metrics, compared to $π_{0.5}$ in both simulation and real-world environments. In simulation, Habilis-$β$ achieves 572.6 TPH and 39.2 s MTBI (vs. 120.5 TPH and 30.5 s for $π_{0.5}$), while in a real-world humanoid logistics workflow it achieves 124 TPH and 137.4 s MTBI (vs. 19 TPH and 46.1 s for $π_{0.5}$). Finally, Habilis-$β$ achieves the highest reported performance on the standard RoboTwin 2.0 leaderboard across representative tasks, validating its effectiveness in complex manipulation scenarios.
Abstract（参考訳）: 実世界展開用に設計された,高速かつ長時間のデバイス上での視覚-言語-アクション(VLA)モデルであるHabilis-$β$を紹介する。現在のVLA評価は、未熟なリセット下での単心室成功率に限られており、実用運用に必要な速さと長期の能力の獲得に失敗している。これを解決するために、高速実行と持続ロバスト性の両方を必要とする連続実行プロトコルの下で、TPH(Tasks per Hour)とMTBI(Mean Time Between Intervention)によるパフォーマンス評価を行うProductivity-Reliability Plane (PRP)を導入する。 Habilis-$β$は、連続したタスクイテレーション間の状態ドリフトをキャプチャするサイクリックタスクデモの後のトレーニングと、ロバストな相互作用に先立って、大規模プレイデータに言語フリーの事前トレーニングを統合することで、高いパフォーマンスを実現する。このシステムは、位相適応型モーションシェーピングにESPADAを使用し、自由空間移動を加速し、整流蒸留を利用してエッジデバイス上での高周波制御を可能にし、分類器フリーガイダンス(CFG)をデプロイ時ノブとして組み込んで、命令の順守と学習前の相互作用を動的にバランスさせる。 1時間の連続実行評価では、Habilis-$β$は、シミュレーションと実環境の両方で$π_{0.5}$と比較して、RPPの指標の下で強い性能を達成する。シミュレーションでは、Habilis-$β$ は 572.6 TPH と 39.2 s MTBI (vs. 120.5 TPH と 30.5 s for $π_{0.5}$) を達成するが、現実のヒューマノイドのロジスティクスワークフローでは 124 TPH と 137.4 s MTBI (vs. 19 TPH と 46.1 s for $π_{0.5}$) を達成する。最後に、Habilis-$β$は、代表的なタスクで標準のRoboTwin 2.0のリーダーボード上で報告された最高のパフォーマンスを達成し、複雑な操作シナリオにおけるその効果を検証する。

論文の概要: Habilis-$β$: A Fast-Motion and Long-Lasting On-Device Vision-Language-Action Model

関連論文リスト