Fugu-MT 論文翻訳(概要): ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

論文の概要: ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

arxiv url: http://arxiv.org/abs/2603.25044v1
Date: Thu, 26 Mar 2026 05:26:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.11842
Title: ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making
Title（参考訳）: サーモアクト:ロボットの知覚と意思決定のためのサーマルアウェアビジョン・ランゲージ・アクションモデル
Authors: Young-Chae Son, Dae-Kwan Ko, Yoon-Ji Choi, Soo-Chul Lim,
Abstract要約: 本稿では,ロボットのタスク実行に熱情報を組み込んだビジョン・ランゲージ・アクション・フレームワークを提案する。提案システムは、複雑な自然言語コマンドを解釈する高レベルプランナとして、ビジョンランゲージモデル(VLM)を利用する。視覚データのみに依存した従来の手法とは異なり,本手法は熱情報を統合し,ロボットが物理的特性を認識し,環境安全を積極的に確保することを可能にする。
参考スコア（独自算出の注目度）: 2.86989372262348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent human-robot collaboration environments, there is a growing focus on integrating diverse sensor data beyond visual information to enable safer and more intelligent task execution. Although thermal data can be crucial for enhancing robot safety and operational efficiency, its integration has been relatively overlooked in prior research. This paper proposes a novel Vision-Language-Action (VLA) framework that incorporates thermal information for robot task execution. The proposed system leverages a Vision-Language Model (VLM) as a high-level planner to interpret complex natural language commands and decompose them into simpler sub-tasks. This approach facilitates efficient data collection and robust reasoning for complex operations. Unlike conventional methods that rely solely on visual data, our approach integrates thermal information, enabling the robot to perceive physical properties and proactively ensure environmental safety. Experimental results from real-world task scenarios validate the feasibility of our proposed framework, suggesting its potential to enhance task success rates and safety compared to existing vision-based systems.
Abstract（参考訳）: 最近の人間とロボットのコラボレーション環境では、視覚情報以外の多様なセンサーデータを統合して、より安全でインテリジェントなタスク実行を可能にしている。熱データは、ロボットの安全性と運用効率を高めるために重要であるが、以前の研究では、その統合は比較的見落とされてきた。本稿では,ロボットのタスク実行に熱情報を組み込んだビジョン・ランゲージ・アクション(VLA)フレームワークを提案する。提案システムでは,視覚言語モデル(VLM)を高レベルプランナとして利用して,複雑な自然言語コマンドを解釈し,より単純なサブタスクに分解する。このアプローチは、複雑な操作に対する効率的なデータ収集と堅牢な推論を容易にする。視覚データのみに依存した従来の手法とは異なり,本手法は熱情報を統合し,ロボットが物理的特性を認識し,環境安全を積極的に確保することを可能にする。実世界のタスクシナリオによる実験結果から提案手法の有効性が検証され,既存のビジョンベースシステムと比較してタスク成功率と安全性を高める可能性が示唆された。

論文の概要: ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

関連論文リスト