Fugu-MT 論文翻訳(概要): Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

論文の概要: Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

arxiv url: http://arxiv.org/abs/2603.05754v1
Date: Thu, 05 Mar 2026 23:26:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.9651
Title: Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation
Title（参考訳）: 安全Night VLA:安全臨界マニピュレーションのためのサーマル・パーセプティブ・ビジョン・ランゲージ・アクションモデル
Authors: Dian Yu, Qingchuan Zhou, Bingkun Huang, Majid Khadiv, Zewen Yang,
Abstract要約: ロボットが見えないものを見ることができるマルチモーダル操作フレームワークであるSafe-Night VLAを提案する。具体的には、Safe-Night VLAは、長波長赤外線熱知覚を事前訓練された視覚言語バックボーンに統合する。我々は,フランカマニピュレータを用いた実世界の実験を通じて,我々の枠組みを検証する。
参考スコア（独自算出の注目度）: 9.129204825142077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current Vision-Language-Action (VLA) models rely primarily on RGB perception, preventing them from capturing modalities such as thermal signals that are imperceptible to conventional visual sensors. Moreover, end-to-end generative policies lack explicit safety constraints, making them fragile when encountering obstacles and novel scenarios outside the training distribution. To address these limitations, we propose Safe-Night VLA, a multimodal manipulation framework that enables robots to see the unseen while enforcing rigorous safety constraints for thermal-aware manipulation in unstructured environments. Specifically, Safe-Night VLA integrates long-wave infrared thermal perception into a pre-trained vision-language backbone, enabling semantic reasoning grounded in thermodynamic properties. To ensure safe execution under out-of-distribution conditions, we incorporate a safety filter via control barrier functions, which provide deterministic workspace constraint enforcement during policy execution. We validate our framework through real-world experiments on a Franka manipulator, introducing a novel evaluation paradigm featuring temperature-conditioned manipulation, subsurface target localization, and reflection disambiguation, while maintaining constrained execution at inference time. Results demonstrate that Safe-Night VLA outperforms RGB-only baselines and provide empirical evidence that foundation models can effectively leverage non-visible physical modalities for robust manipulation.
Abstract（参考訳）: 現在のVision-Language-Action(VLA)モデルは、主にRGBの知覚に依存しており、従来の視覚センサーでは認識できない熱信号のようなモダリティの捕捉を妨げている。さらに、エンドツーエンドの生成ポリシーには明確な安全性の制約がなく、トレーニングディストリビューション外の障害や新しいシナリオに遭遇する際にも脆弱である。これらの制約に対処するために,ロボットが非構造環境におけるサーマルアウェア操作の厳密な安全制約を課しながら,その見えないものを見ることができるマルチモーダル操作フレームワークであるSafe-Night VLAを提案する。具体的には、Safe-Night VLAは、長波長赤外線の熱知覚をトレーニング済みの視覚言語バックボーンに統合し、熱力学特性に基づく意味論的推論を可能にする。アウト・オブ・ディストリビューション条件下での安全な実行を確保するため、制御バリア機能を介して安全フィルタを導入し、ポリシー実行中に決定論的ワークスペース制約を強制する。我々は,Frankaマニピュレータを用いた実世界の実験を通じて,温度条件の操作,地表面の目標位置化,反射の曖昧さを特徴とする新しい評価パラダイムを導入し,推論時の制約実行を維持しながら,本フレームワークの有効性を検証した。その結果, Safe-Night VLAはRGBのみのベースラインよりも優れており, 基礎モデルが頑健な操作に非可視的な物理的モダリティを効果的に活用できるという実証的証拠が得られた。

論文の概要: Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

関連論文リスト