Fugu-MT 論文翻訳(概要): Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction

論文の概要: Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction

arxiv url: http://arxiv.org/abs/2605.15157v1
Date: Thu, 14 May 2026 17:51:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:35.000412
Title: Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction
Title（参考訳）: Hand-in-the-Loop:Seamless Interventional Correctionによる難治性VLAの改善
Authors: Zhuohang Li, Liqun Huang, Wei Xu, Zhengming Zhu, Nie Lin, Xiao Ma, Xinjun Sheng, Ruoshi Wen,
Abstract要約: Hand-in-the-Loop (HandITL) は、人間の修正意図と自律的な政策実行をブレンドする。 HandITLはテイクオーバジッターを99.8%削減し、テイクオーバ後の堅牢な操作を維持する。標準的な遠隔操作データで訓練された人より平均して19%上回るポリシーを3つの長い水平なタスクに当てはめている。
参考スコア（独自算出の注目度）: 15.34162270431179
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human takeover data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the takeover moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with direct teleoperation takeover, HandITL reduces takeover jitter by 99.8% and preserves robust post-takeover manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect intervention data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.
Abstract（参考訳）: VLA(Vision-Language-Action)モデルは、高次元のアクション空間とコンタクトリッチなダイナミクスが長い地平線上の小さな政策偏差を増幅するデキスタラスな操作において、エラーを複雑化する傾向にある。インタラクティブ・イミテーション・ラーニング (Interactive Imitation Learning, IIL) は、人間のテイクオーバーデータを通じてポリシーを洗練することができるが、人間の遠隔操作とテイクオーバー瞬間におけるポリシー実行のコマンドミスマッチにより、ロボットの手の急激な構成変更や「ジェスチャージャンプ」を引き起こすため、ロボットハンドを高自由度(DoF)に応用することは依然として困難である。 HandITL (Hand-in-the-Loop) は、人間の矯正意図と自律的なポリシー実行をブレンドし、両眼的な操作時のジェスチャージャンプを回避する。直接遠隔操作のテイクオーバーと比較して、HandITLはテイクオーバジッターを99.8%削減し、テイクオーバ後の堅牢な操作を維持し、把握障害を87.5%減らし、平均完了時間を19.1%減らした。両面調整,ツール使用,細粒度長水平操作が必要なタスクに対してHandITLを検証した。政策改善のための介入データ収集に使用される場合、HandITLは、標準的な遠隔操作データで訓練された人々より19%上回るポリシーを3つの長期的タスクで取得する。

論文の概要: Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction

関連論文リスト