Fugu-MT 論文翻訳(概要): Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

論文の概要: Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

arxiv url: http://arxiv.org/abs/2606.04811v2
Date: Thu, 04 Jun 2026 10:52:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 19:21:33.311208
Title: Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?
Title（参考訳）: Dream.exe: ビデオ生成モデルは、実行可能なロボット操作を実現できるか?
Authors: Rui Zhao, Kaiming Yang, Jifeng Zhu, Siyang Chen, Ziqi Wang, Weijia Wu, Kevin Qinghong Lin, Heng Wang, Mike Zheng Shou,
Abstract要約: 本稿では,ロボット操作を具体的かつ測定可能な窓として提案する。もしモデルが本当に内在的な物理法則を持っているなら、その動きは実行可能なロボットの振る舞いに変換されるべきである。 Dream$.$exeは、ビデオから実行パイプラインを通じてこの基準を運用する評価フレームワークである。
参考スコア（独自算出の注目度）: 57.234658753381915
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic manipulation as a concrete, measurable window onto this question: if a model has truly internalized physical laws, the motion it depicts should translate into executable robot behavior. We introduce Dream$.$exe, an evaluation framework that operationalizes this criterion through a video-to-execution pipeline. Given a scene image and a task description, Dream$.$exe synthesizes a manipulation video, converts the generated motion into robot trajectories, and executes them in a physics simulator, yielding a grounding signal that purely visual metrics cannot offer. Using this pipeline, we evaluate 8 models spanning frontier closed-source generators, open-source generators, and robot-specific models. Our benchmark covers 101 manually curated manipulation tasks at three levels of physical complexity, measured across visual quality, trajectory fidelity, and execution success. Encouragingly, several models achieve measurable execution success, suggesting that generative priors learned from internet-scale data already encode meaningful physical knowledge. Yet visual quality proves a poor predictor of executability, exposing a dimension of model capability that standard visual evaluations do not capture. Dream$.$exe will be open-sourced at https://github.com/showlab/Dream.exe.
Abstract（参考訳）: ビデオ生成モデルは、視覚的に魅力的なコンテンツを合成するのに驚くべき進歩を遂げてきたが、その出力は依然として仮想ドメインに限られている。これらのモデルは、生成されたビデオが画面から出て現実に入るとき、物理的な世界をどの程度反映しているのか? そこで本研究では,ロボット操作を具体的かつ測定可能な窓として提案する。もしモデルが本当に内在的な物理法則を持つなら,ロボットの動きはロボットの動作に変換されるべきである。ドリーム$を紹介します。 $exeは、ビデオから実行パイプラインを通じてこの基準を運用する評価フレームワークである。シーンイメージとタスク記述が与えられたら、Dream$。 $exeは操作ビデオを合成し、生成された動きをロボットの軌跡に変換し、物理シミュレーターでそれらを実行する。このパイプラインを用いて、フロンティアクローズドソースジェネレータ、オープンソースジェネレータ、ロボット固有のモデルにまたがる8つのモデルを評価する。我々のベンチマークでは、視覚的品質、軌道の忠実さ、実行成功の3つのレベルにおいて、手作業による操作タスクを3段階に分けた101をカバーしています。複数のモデルが測定可能な実行成功を達成し、インターネットスケールのデータから得られた生成前の先行情報がすでに意味のある物理的知識を符号化していることを示唆している。しかし、視覚的品質は実行可能性の予測に乏しいことを証明し、標準的な視覚的評価では捉えられないようなモデルの能力の次元を明らかにします。ドリーム$。 $exeはhttps://github.com/showlab/Dream.exe.comでオープンソース化される。

論文の概要: Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

関連論文リスト