Fugu-MT 論文翻訳(概要): Temporal Logic Guidance for Action-Only Diffusion Policies with World Models

論文の概要: Temporal Logic Guidance for Action-Only Diffusion Policies with World Models

arxiv url: http://arxiv.org/abs/2606.22729v1
Date: Mon, 22 Jun 2026 00:12:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 05:02:40.060902
Title: Temporal Logic Guidance for Action-Only Diffusion Policies with World Models
Title（参考訳）: 世界モデルを用いた行動専用拡散政策のための時間論理ガイダンス
Authors: Moritz Zoellner, Anastasios Manganaris, Rohan Paleja,
Abstract要約: 拡散ポリシはマルチモーダルロボットの動作を可能にするが、推論時に動作モードを選択する能力に制限がある。本研究では,STLの識別可能な評価を可能にするために,個別の学習世界モデルを用いた行動専用拡散ポリシーの新しいガイダンス手法を提案する。これにより、再トレーニングを伴わずに制約満足度に向かっての行動が促進され、タスク性能を維持しながら制約順守が改善される。
参考スコア（独自算出の注目度）: 0.764671395172401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion policies enable multimodal robot behavior but offer limited ability to choose among behavior modes at inference time, even though such control is desirable in human-robot settings. Prior solutions to this lack of control have utilized Signal Temporal Logic (STL) to express human intentions and provide corresponding guidance for diffusion policy inference. However, these approaches can only guide diffusion policies that jointly generate future actions and states, increasing both complexity and runtime. We propose a novel guidance method for action-only diffusion policies that uses a separate learned world model to enable differentiable evaluation of STL robustness, with its gradient then injected into the diffusion process. This steers behavior toward constraint satisfaction without retraining, improving constraint adherence while preserving task performance. On the Can Transport task from Robomimic, our method maintains 100% task success while reducing constraint violations from over 80% for baseline methods to 4%. We also discuss extensions toward improved robustness and more complex constraints.
Abstract（参考訳）: 拡散ポリシはマルチモーダルロボットの動作を可能にするが、人間ロボットの設定ではそのような制御が望ましいとしても、推論時に行動モードを選択する能力に制限がある。この制御の欠如に対する以前の解決策は、STL(Signal Temporal Logic)を用いて人間の意図を表現し、拡散政策推論のための対応するガイダンスを提供している。しかしながら、これらのアプローチは、将来のアクションと状態を共同で生成し、複雑さとランタイムの両方を増大させる拡散ポリシーを導出することしかできない。本研究では,STLのロバスト性の評価を微分可能とし,その勾配を拡散過程に注入する,個別の学習世界モデルを用いた行動のみ拡散ポリシーの新しいガイダンス手法を提案する。これにより、再トレーニングを伴わずに制約満足度に向かっての行動が促進され、タスク性能を維持しながら制約順守が改善される。 Robomimic の Can Transport タスクでは,基準手法の制約違反を 80% 以上から 4% まで低減しつつ,100% のタスク成功を維持している。また、ロバスト性の改善やより複雑な制約への拡張についても論じる。

論文の概要: Temporal Logic Guidance for Action-Only Diffusion Policies with World Models

関連論文リスト