Fugu-MT 論文翻訳(概要): HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing

論文の概要: HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing

arxiv url: http://arxiv.org/abs/2603.15257v1
Date: Mon, 16 Mar 2026 13:24:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-21 18:33:56.86778
Title: HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing
Title（参考訳）: HapticVLA:推論時触覚を伴わない視覚・言語・反応モデルによるコンタクトリッチ操作
Authors: Konstantin Gubernatorov, Mikhail Sannikov, Ilya Mikhalchuk, Egor Kuznetsov, Makar Artemov, Ogunwoye Faith Ouwatobi, Marcelino Fernando, Artem Asanov, Ziang Guo, Dzmitry Tsetserukou,
Abstract要約: 触覚を意識した操作はオフラインで学習でき、推論時に直接触覚フィードバックなしで展開できると我々は主張する。本稿では,HapticVLAについて述べる。HapticVLAは,SA-RWFM (Safety-Aware Reward-Weighted Flow Matching) とTactile Distillation (TD) の2つの密結合段階で進行する。
参考スコア（独自算出の注目度）: 1.5861606459586157
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Tactile sensing is a crucial capability for Vision-Language-Action (VLA) architectures, as it enables dexterous and safe manipulation in contact-rich tasks. However, reliance on dedicated tactile hardware increases cost and reduces reproducibility across robotic platforms. We argue that tactile-aware manipulation can be learned offline and deployed without direct haptic feedback at inference. To this end, we present HapticVLA, which proceeds in two tightly coupled stages: Safety-Aware Reward-Weighted Flow Matching (SA-RWFM) and Tactile Distillation (TD). SA-RWFM trains a flow-matching action expert that incorporates precomputed, safety-aware tactile rewards penalizing excessive grasping force and suboptimal grasping trajectories. TD further transfers this tactile-aware capability into a conventional VLA: we distill a compact tactile token from the SA-RWFM teacher and train a student VLA to predict that token from vision and state modalities, enabling tactile-aware action generation at inference without requiring on-board tactile sensors. This design preserves contact-rich tactile-aware reasoning within VLA while removing the need for on-board tactile sensors during deployment. On real-world experiments, HapticVLA achieves a mean success rate of 86.7%, consistently outperforming baseline VLAs - including versions provided with direct tactile feedback during inference.
Abstract（参考訳）: 触覚は視覚ランゲージ・アクション(VLA)アーキテクチャにとって重要な機能である。しかし、専用の触覚ハードウェアへの依存はコストを増大させ、ロボットプラットフォーム間の再現性を低下させる。触覚を意識した操作はオフラインで学習でき、推論時に直接触覚フィードバックなしで展開できると我々は主張する。この目的のために,HapticVLAを2つの密結合段階,SA-RWFM (Safety-Aware Reward-Weighted Flow Matching) と Tactile Distillation (TD) を提示する。 SA-RWFMは、事前計算された、安全を意識した触覚報酬を取り入れたフローマッチングアクションエキスパートを訓練し、過剰なつかみ力と準最適つかみ軌跡を罰する。我々はSA-RWFM教師から小型の触覚トークンを蒸留し、学生のVLAを訓練し、そのトークンを視覚と状態のモダリティから予測し、オンボードの触覚センサを必要とせずに、推論時の触覚アクション生成を可能にする。この設計は、VLA内のコンタクトリッチな触覚認識推論を保ちながら、展開中に搭載された触覚センサーの必要性を除去する。実世界の実験では、HapticVLAは86.7%の平均的な成功率を達成し、推論中に直接触覚フィードバックを提供するバージョンを含む、一貫してベースラインVLAを上回っている。

論文の概要: HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing

関連論文リスト