Fugu-MT 論文翻訳(概要): RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation

論文の概要: RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation

arxiv url: http://arxiv.org/abs/2508.21378v1
Date: Fri, 29 Aug 2025 07:47:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-01 19:45:10.949111
Title: RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation
Title（参考訳）: ロボインスペクタ:LDM対応ロボットマニピュレーションのためのポリシーコードの信頼性の低下
Authors: Chenduo Ying, Linkang Du, Peng Cheng, Yuanchao Shu,
Abstract要約: 大規模言語モデル(LLM)は、推論とコード生成において顕著な能力を示す。進歩にも拘わらず、信頼性の高いポリシコード生成を実現することは、さまざまな要件のために依然として大きな課題である。 LLM対応ロボット操作におけるポリシーコードの信頼性の欠如を明らかにするパイプラインであるRoboInspectorを紹介する。
参考スコア（独自算出の注目度）: 7.650053106303868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) demonstrate remarkable capabilities in reasoning and code generation, enabling robotic manipulation to be initiated with just a single instruction. The LLM carries out various tasks by generating policy code required to control the robot. Despite advances in LLMs, achieving reliable policy code generation remains a significant challenge due to the diverse requirements of real-world tasks and the inherent complexity of user instructions. In practice, different users may provide distinct instructions to drive the robot for the same task, which may cause the unreliability of policy code generation. To bridge this gap, we design RoboInspector, a pipeline to unveil and characterize the unreliability of the policy code for LLM-enabled robotic manipulation from two perspectives: the complexity of the manipulation task and the granularity of the instruction. We perform comprehensive experiments with 168 distinct combinations of tasks, instructions, and LLMs in two prominent frameworks. The RoboInspector identifies four main unreliable behaviors that lead to manipulation failure. We provide a detailed characterization of these behaviors and their underlying causes, giving insight for practical development to reduce unreliability. Furthermore, we introduce a refinement approach guided by failure policy code feedback that improves the reliability of policy code generation by up to 35% in LLM-enabled robotic manipulation, evaluated in both simulation and real-world environments.
Abstract（参考訳）: 大規模言語モデル(LLM)は推論とコード生成において顕著な能力を示し、単一の命令だけでロボット操作を開始することができる。 LLMは、ロボットを制御するために必要なポリシーコードを生成することで、様々なタスクを実行する。 LLMの進歩にもかかわらず、実際のタスクの多様な要求とユーザ命令の固有の複雑さのために、信頼性の高いポリシーコード生成を実現することは大きな課題である。実際には、異なるユーザが同じタスクのためにロボットを駆動するための個別の指示を提供する可能性があるため、ポリシーコード生成の信頼性が低下する可能性がある。このギャップを埋めるために、我々は、操作タスクの複雑さと命令の粒度という2つの視点から、LLM対応ロボット操作のためのポリシーコードの信頼性の欠如を明らかにし特徴付けるパイプラインであるRoboInspectorを設計した。我々は2つの著名なフレームワークで168の異なるタスク、命令、LLMの組み合わせで包括的な実験を行う。 Robo Inspectorは、操作障害につながる4つの主要な信頼性の低い動作を特定する。我々は,これらの行動とその根本原因を詳細に把握し,信頼性の低下を図るための実践的開発について考察する。さらに,LLM対応ロボット操作におけるポリシコード生成の信頼性を最大35%向上し,シミュレーションと実環境の両方で評価する,フェールポリシーコードフィードバックによる改善手法を提案する。

論文の概要: RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation

関連論文リスト