Fugu-MT 論文翻訳(概要): R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

論文の概要: R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

arxiv url: http://arxiv.org/abs/2606.17040v1
Date: Mon, 15 Jun 2026 17:56:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 18:36:05.185184
Title: R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies
Title（参考訳）: R2RDreamer:空間一般化2次元マニピュレーションポリシのための3D対応データ拡張
Authors: Xiuwei Xu, Haowen Sun, Angyuan Ma, Yiwei Zhang, Zhenyu Wu, Xiaofeng Wang, Bingyao Yu, Zheng Zhu, Jie Zhou, Jiwen Lu,
Abstract要約: R2RDreamerは実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実-実- 視覚的補完を2次元ビデオ空間に移動させながら、3次元のアクション・オブザーブレーション編集の幾何的整合性を維持する。
参考スコア（独自算出の注目度）: 86.2249156068836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial generalization is critical for imitation-learned manipulation policies, but achieving it typically requires scaling demonstrations across diverse object poses, robot configurations, and camera viewpoints. Data augmentation from a few source demonstrations offers a practical alternative to costly real-world collection. Simulation-based augmentation can create controllable variation, but requires complex environment and object setup and may introduce a sim-to-real gap. Recent real-to-real methods avoid these issues by jointly editing 3D observations and action trajectories from real demonstrations, yet they still rely on strong 3D scene parsing and geometry completion, and often produce observations tailored to 3D pointcloud policies rather than RGB-based 2D policies. We propose R2RDreamer, a real-to-real demonstration augmentation framework that preserves the geometric consistency of 3D action-observation editing while moving visual completion to 2D video space. Specifically, R2RDreamer first performs lightweight 3D augmentation by editing incomplete object pointclouds and end-effector trajectories in a shared 3D frame; it then projects the edited scene into masked image-space control videos with occlusion-aware reasoning and uses a dense-control image-to-video model to complete temporally coherent RGB observations. Experiments on spatially shifted manipulation tasks with both 2D diffusion-style policies and vision-language-action policies show that R2RDreamer improves spatial generalization from limited source demonstrations, with analyses validating the contributions of 3D editing, occlusion-aware projection, and video completion.
Abstract（参考訳）: 空間的一般化は、模倣学習された操作ポリシーにとって重要であるが、それを達成するには、様々なオブジェクトのポーズ、ロボットの設定、カメラの視点にまたがるデモをスケールする必要がある。いくつかの情報源によるデータ拡張は、コストのかかる現実世界のコレクションに代わる実用的な代替手段を提供する。シミュレーションベースの拡張は制御可能なバリエーションを生み出すことができるが、複雑な環境とオブジェクトの設定が必要であり、sim-to-realギャップを導入する可能性がある。近年のリアル・トゥ・リアルな手法は、実演から3Dの観察とアクション・トラジェクトリを共同編集することでこれらの問題を回避しているが、それでも強力な3Dシーン解析と幾何学的完備化に依存しており、RGBベースの2Dポリシーではなく、3Dポイントクラウド・ポリシーに適合した観察を生成することが多い。 R2RDreamerは,視覚的補完を2次元ビデオ空間に移動させながら,3次元動作・観測編集の幾何的整合性を保った実演拡張フレームワークである。具体的には、R2RDreamerはまず、未完成のオブジェクトポイントクラウドとエンドエフェクタトラジェクトリを共有3Dフレームで編集して軽量な3D拡張を行う。 R2RDreamerは3次元編集,オクルージョン・アウェア・プロジェクション,ビデオ補完の寄与を検証し,空間的一般化を改善することを示す。

論文の概要: R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

関連論文リスト