Fugu-MT 論文翻訳(概要): dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

論文の概要: dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

arxiv url: http://arxiv.org/abs/2604.22152v1
Date: Fri, 24 Apr 2026 01:50:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-27 15:36:26.303579
Title: dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model
Title（参考訳）: dWorldEval:離散拡散世界モデルによるスケーラブルなロボット政策評価
Authors: Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu,
Abstract要約: 本稿では,ロボットポリシーのスケーラブルな評価プロキシとして,離散拡散世界モデルを用いたdWorldEvalを提案する。 dWorldEvalは、視覚、言語、ロボットアクションを含むすべてのモダリティを統一トークン空間にマッピングし、単一のトランスフォーマーベースの認知ネットワークを介してそれらに到達する。
参考スコア（独自算出の注目度）: 14.221014931347327
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy evaluation. In this paper, we propose dWorldEval, which uses a discrete diffusion world model as a scalable evaluation proxy for robotics policies. Specifically, dWorldEval maps all modalities - including vision, language, and robotic actions - into a unified token space, modeling them via a single transformer-based denoising network. In this paper, we propose dWorldEval, using a discrete diffusion world model as a scalable evaluation proxy for robotics policy. Specifically, it maps all modalities, including vision, language, and robotics action into a unified token space, then denoises them with a single transformer network. Building on this architecture, we employ a sparse keyframe memory to maintain spatiotemporal consistency. We also introduce a progress token that indicates the degree of task completion. At inference, the model jointly predicts future observations and progress token, allowing automatically determine success when the progress reaches 1. Extensive experiments demonstrate that dWorldEval significantly outperforms previous approaches, i.e., WorldEval, Ctrl-World, and WorldGym, on LIBERO, RoboTwin, and multiple real-robot tasks. It paves the way for a new architectural paradigm in building world simulators for robotics evaluation at scale.
Abstract（参考訳）: 何千もの環境と何千ものタスクにわたるロボットポリシーを評価することは、既存のアプローチでは不可能である。これは、スケーラブルなロボティクスポリシー評価のための新しい方法論の必要性を動機付けている。本稿では,ロボットポリシーのスケーラブルな評価プロキシとして,離散拡散世界モデルを用いたdWorldEvalを提案する。具体的には、dWorldEvalは、視覚、言語、ロボットアクションを含むすべてのモダリティを統一トークン空間にマッピングし、単一のトランスフォーマーベースの認知ネットワークを通じてモデリングする。本稿では,ロボット政策のスケーラブルな評価プロキシとして,離散拡散世界モデルを用いたdWorldEvalを提案する。具体的には、視覚、言語、ロボットなどのあらゆるモダリティを統一されたトークン空間にマッピングし、単一のトランスフォーマーネットワークでそれらを認知する。このアーキテクチャに基づいて、時空間整合性を維持するために、スパースキーフレームメモリを使用します。タスク完了の度合いを示すプログレストークンも導入する。推測時に、モデルは将来の観測と進捗トークンを共同で予測し、進捗が1に達すると自動的に成功を判定する。 dWorldEvalは、LIBERO、RoboTwin、および複数の実ロボットタスクにおいて、従来のアプローチであるWorldEval、Ctrl-World、WorldGymを大きく上回ることを示した。ロボット工学評価のための世界シミュレータの構築において、新しいアーキテクチャパラダイムの道を開く。

論文の概要: dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

関連論文リスト