Fugu-MT 論文翻訳(概要): RISE: Self-Improving Robot Policy with Compositional World Model

論文の概要: RISE: Self-Improving Robot Policy with Compositional World Model

arxiv url: http://arxiv.org/abs/2602.11075v1
Date: Wed, 11 Feb 2026 17:43:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.374138
Title: RISE: Self-Improving Robot Policy with Compositional World Model
Title（参考訳）: RISE:構成世界モデルによる自己改善ロボット政策
Authors: Jiazhi Yang, Kunyang Lin, Jinwei Li, Wencong Zhang, Tianwei Lin, Longyan Wu, Zhizhong Su, Hao Zhao, Ya-Qin Zhang, Li Chen, Ping Luo, Xiangyu Yue, Hongyang Li,
Abstract要約: 我々は、想像力によるロボット強化学習のスケーラブルなフレームワークRISEを紹介する。中心となるのは構成的世界モデルであり、制御可能なダイナミクスモデルを通じて多視点の未来を予測し、(ii)進捗値モデルを用いて想像結果を評価する。これらのコンポーネントはクローズドループの自己改善パイプラインに統合され、虚構のロールアウトを継続的に生成し、利点を推定し、コストのかかる物理的相互作用なしに虚構空間のポリシーを更新する。
参考スコア（独自算出の注目度）: 52.227523057681786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.
Abstract（参考訳）: モデルキャパシティとデータ取得の持続的なスケーリングにもかかわらず、Vision-Language-Action(VLA)モデルは、小さな実行偏差が障害に混ざり合うような、コンタクトリッチでダイナミックな操作タスクでは不安定なままである。強化学習(RL)はロバスト性への原則的な経路を提供するが、物理的世界におけるオンラインRLは、安全リスク、ハードウェアコスト、環境リセットによって制約される。このギャップを埋めるために、想像力によるロボット強化学習のスケーラブルなフレームワークRISEを紹介します。中心となるのは構成世界モデルです (i)制御可能な動的モデルを用いて多視点未来を予測し、二進捗価値モデルにより予測された結果を評価し、政策改善の有益な利点を生み出す。このような構成設計により、状態と価値は最も適しているが異なるアーキテクチャと目的によって調整される。これらのコンポーネントはクローズドループの自己改善パイプラインに統合され、虚構のロールアウトを継続的に生成し、利点を推定し、コストのかかる物理的相互作用なしに虚構空間のポリシーを更新する。 3つの挑戦的な現実的タスクの中でRISEは先行技術よりも大幅に改善され、動的ブロックソートでは+35%以上の絶対的なパフォーマンスが向上し、バックパックパッキングでは+45%、ボックスクローズでは+35%が向上した。

論文の概要: RISE: Self-Improving Robot Policy with Compositional World Model

関連論文リスト