Fugu-MT 論文翻訳(概要): LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot Manipulation

論文の概要: LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot Manipulation

arxiv url: http://arxiv.org/abs/2605.29864v1
Date: Thu, 28 May 2026 12:49:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.249905
Title: LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot Manipulation
Title（参考訳）: マルチステップロボットマニピュレーションにおける水平方向探索のためのLCMガイドによる将来仮説
Authors: Mohammad Khoshnazar, Andrew Melnik, Michael Beetz,
Abstract要約: マルチステップロボット操作では、シーンがどのように進化するかの不確実性の下で行動する必要がある。本研究では,短時間のタスク一貫性のある未来のビデオが,制御や強化学習の微調整に有用であるかどうかを考察する。
参考スコア（独自算出の注目度）: 5.637033593506126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-step robot manipulation requires acting under uncertainty about how the scene will evolve, making exploration and policy adaptation challenging. We study whether short-horizon, task-consistent future videos can provide useful structured priors for control and reinforcement-learning fine-tuning. We formalize this idea through Future-Experience Conditioning (FEC), a simple interface that conditions closed-loop policies on a latent representation of a short future video. In our simulation setup, future clips are generated in three stages, an LLM reasoner operating over a task ontology initialized from the current scene state, a robot-free digital-twin rollout of the intended object motion, and a mask-free video diffusion model that synthesizes a robot-consistent future clip without requiring segmentation at inference. We instantiate this future-conditioning interface primarily with BC and BC+RL, and compare against a future-conditioned Streaming Flow Policy (SFP) baseline on RoboCasa and CALVIN under NoFuture, GTFuture, GenFuture, and WrongFuture. Generated futures improve performance over no-future conditioning, while mismatched futures degrade it, and our BC+RL instantiation achieves the strongest overall results. An average BC+RL learning-curve analysis across 8 CALVIN tasks further shows that GTFuture improves fastest, GenFuture improves earlier and to a higher level than NoFuture, and WrongFuture remains at zero throughout training. These results suggest that short-horizon future videos can serve as useful structured priors for exploration and policy adaptation under imperfect future predictions. https://enact2026.github.io/
Abstract（参考訳）: マルチステップロボット操作は、シーンがどのように進化するかの不確実性の下で行動することを必要とし、探索とポリシー適応を困難にしている。短時間のタスク一貫性のある未来のビデオが、制御や強化学習の微調整に有用な構造化された事前情報を提供できるかどうかを考察する。我々は,このアイデアを,近未来のビデオの潜在表現にクローズドループポリシーを条件付けるシンプルなインターフェースであるFuture-Experience Conditioning (FEC) を通じて形式化する。シミュレーションでは,現在のシーン状態から初期化したタスクオントロジーを操作可能なLCM推論器,対象物の動きのロボットフリーデジタルツインロールアウト,推論のセグメンテーションを必要とせずにロボットと共存する将来のクリップを合成するマスクフリービデオ拡散モデル,の3段階で将来のクリップを生成する。我々は,この将来条件のインターフェースを,主にBCとBC+RLでインスタンス化し,NoFuture,GTFuture,GenFuture,WrongFutureの下で,RoboCasaおよびCALVIN上の将来条件のストリーミングフローポリシー(SFP)ベースラインと比較する。生成した先物は、非未来条件よりも性能を向上し、不一致先物は劣化し、BC+RLインスタンス化は、最も優れた総合的な結果を得る。 8つのCALVINタスクの平均BC+RL学習曲線解析により、GTFutureは高速に改善され、GenFutureはNoFutureよりも早く、より高いレベルまで改善され、WrongFutureはトレーニングを通してゼロのままである。これらの結果から, 近地将来の映像は, 不完全な将来予測の下での探索・政策適応に有用な構造化された先行映像として機能する可能性が示唆された。 https://enact2026.github.io/

論文の概要: LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot Manipulation

関連論文リスト