Fugu-MT 論文翻訳(概要): Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning

論文の概要: Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.25685v1
Date: Thu, 26 Mar 2026 17:36:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.402774
Title: Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning
Title（参考訳）: 永続型ロボットワールドモデル:強化学習によるマルチステップロールアウトの安定化
Authors: Jai Bardhan, Patrik Drozdik, Josef Sivic, Vladimir Petrik,
Abstract要約: アクションコンディショニングされたロボットワールドモデルは、ロボットアクションシーケンスが与えられた操作されたシーンの将来のビデオフレームを生成する。これらのモデルは、短時間の予測に最適化され、自動回帰的にデプロイされたときに分解される。我々は,自己回帰的なロールアウトで世界モデルを訓練する強化学習スキームを導入する。
参考スコア（独自算出の注目度）: 18.397872306430006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Action-conditioned robot world models generate future video frames of the manipulated scene given a robot action sequence, offering a promising alternative for simulating tasks that are difficult to model with traditional physics engines. However, these models are optimized for short-term prediction and break down when deployed autoregressively: each predicted clip feeds back as context for the next, causing errors to compound and visual quality to rapidly degrade. We address this through the following contributions. First, we introduce a reinforcement learning (RL) post-training scheme that trains the world model on its own autoregressive rollouts rather than on ground-truth histories. We achieve this by adapting a recent contrastive RL objective for diffusion models to our setting and show that its convergence guarantees carry over exactly. Second, we design a training protocol that generates and compares multiple candidate variable-length futures from the same rollout state, reinforcing higher-fidelity predictions over lower-fidelity ones. Third, we develop efficient, multi-view visual fidelity rewards that combine complementary perceptual metrics across camera views and are aggregated at the clip level for dense, low-variance training signal. Fourth, we show that our approach establishes a new state-of-the-art for rollout fidelity on the DROID dataset, outperforming the strongest baseline on all metrics (e.g., LPIPS reduced by 14% on external cameras, SSIM improved by 9.1% on the wrist camera), winning 98% of paired comparisons, and achieving an 80% preference rate in a blind human study.
Abstract（参考訳）: アクションコンディショニングされたロボットワールドモデルは、ロボットアクションシーケンスが与えられた操作されたシーンの将来のビデオフレームを生成し、従来の物理エンジンでモデル化するのが難しいタスクをシミュレートするための有望な代替手段を提供する。予測された各クリップは、次のコンテキストとして返されるので、エラーが複雑になり、視覚的品質が急速に低下する。私たちは以下のコントリビューションを通じてこの問題に対処します。まず, 地下構造ではなく, 自己回帰的なロールアウトで世界モデルを訓練する強化学習(RL)ポストトレーニング手法を提案する。拡散モデルに対する最近の対照的なRL目標を我々の設定に適応させ、その収束保証が正確に成り立つことを示す。第2に、同一ロールアウト状態から複数の候補変数長先を生成・比較するトレーニングプロトコルを設計し、低忠実度よりも高忠実度予測を補強する。第3に,カメラビュー間の相補的知覚指標を組み合わせた,高密度かつ低ばらつきのトレーニング信号に対して,クリップレベルで集約された効率的な多視点視覚忠実報酬を開発する。第4に,本手法はDROIDデータセット上でのロールアウト忠実性の新たな最先端性を確立し,全指標(例えば,LPIPSが14%,手首カメラが9.1%,SSIMが9.1%,ペア比較が98%,盲人人間の研究が80%)で最強のベースラインを達成した。

論文の概要: Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning

関連論文リスト