Fugu-MT 論文翻訳(概要): Trace-Focused Diffusion Policy for Multi-Modal Action Disambiguation in Long-Horizon Robotic Manipulation

論文の概要: Trace-Focused Diffusion Policy for Multi-Modal Action Disambiguation in Long-Horizon Robotic Manipulation

arxiv url: http://arxiv.org/abs/2602.07388v1
Date: Sat, 07 Feb 2026 06:06:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.251606
Title: Trace-Focused Diffusion Policy for Multi-Modal Action Disambiguation in Long-Horizon Robotic Manipulation
Title（参考訳）: 長軸ロボットマニピュレーションにおける多モード動作曖昧化のためのトレース焦点拡散政策
Authors: Yuxuan Hu, Xiangyu Chen, Chuhao Zhou, Yuxi Liu, Gen Li, Jindou Jia, Jianfei Yang,
Abstract要約: Trace-Focused Diffusion Policy (TF-DP) は、ロボットの実行履歴にアクション生成を明示的に条件付ける拡散ベースのフレームワークである。実世界のロボット操作作業におけるTF-DPの評価を行った。
参考スコア（独自算出の注目度）: 27.077503086179863
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative model-based policies have shown strong performance in imitation-based robotic manipulation by learning action distributions from demonstrations. However, in long-horizon tasks, visually similar observations often recur across execution stages while requiring distinct actions, which leads to ambiguous predictions when policies are conditioned only on instantaneous observations, termed multi-modal action ambiguity (MA2). To address this challenge, we propose the Trace-Focused Diffusion Policy (TF-DP), a simple yet effective diffusion-based framework that explicitly conditions action generation on the robot's execution history. TF-DP represents historical motion as an explicit execution trace and projects it into the visual observation space, providing stage-aware context when current observations alone are insufficient. In addition, the induced trace-focused field emphasizes task-relevant regions associated with historical motion, improving robustness to background visual disturbances. We evaluate TF-DP on real-world robotic manipulation tasks exhibiting pronounced multi-modal action ambiguity and visually cluttered conditions. Experimental results show that TF-DP improves temporal consistency and robustness, outperforming the vanilla diffusion policy by 80.56 percent on tasks with multi-modal action ambiguity and by 86.11 percent under visual disturbances, while maintaining inference efficiency with only a 6.4 percent runtime increase. These results demonstrate that execution-trace conditioning offers a scalable and principled approach for robust long-horizon robotic manipulation within a single policy.
Abstract（参考訳）: モデルに基づく生成ポリシーは、実演から行動分布を学習することにより、模倣に基づくロボット操作において強い性能を示している。しかし、ロングホライゾンタスクでは、視覚的に類似した観察が実行段階で再帰し、異なる行動を必要とすることがあり、これは、ポリシーが即時的な観察にのみ条件付けられている場合、マルチモーダルアクション曖昧性 (MA2) と呼ばれる曖昧な予測をもたらす。この課題に対処するために,ロボットの実行履歴に対して動作生成を明示的に規定する,シンプルで効果的な拡散ベースのフレームワークであるTrace-Focused Diffusion Policy (TF-DP)を提案する。 TF-DPは、歴史的動きを明示的な実行トレースとして表現し、それを視覚的な観察空間に投影し、現在の観測だけでは不十分な段階認識コンテキストを提供する。さらに、トレーサ重視のフィールドは、歴史的動きに関連するタスク関連領域を強調し、背景視覚障害に対するロバスト性を向上させる。実世界のロボット操作作業におけるTF-DPの評価を行った。実験の結果、TF-DPは時間的一貫性と堅牢性を向上し、バニラ拡散政策を80.66%上回り、マルチモーダル動作の曖昧さを伴うタスクを86.11パーセント上回った。これらの結果から,単一ポリシ内での堅牢な長距離ロボット操作に対して,実行トレース条件がスケーラブルかつ原則化されたアプローチを提供することが示された。

論文の概要: Trace-Focused Diffusion Policy for Multi-Modal Action Disambiguation in Long-Horizon Robotic Manipulation

関連論文リスト