Fugu-MT 論文翻訳(概要): RoboAug: One Annotation to Hundreds of Scenes via Region-Contrastive Data Augmentation for Robotic Manipulation

論文の概要: RoboAug: One Annotation to Hundreds of Scenes via Region-Contrastive Data Augmentation for Robotic Manipulation

arxiv url: http://arxiv.org/abs/2602.14032v1
Date: Sun, 15 Feb 2026 07:40:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.471936
Title: RoboAug: One Annotation to Hundreds of Scenes via Region-Contrastive Data Augmentation for Robotic Manipulation
Title（参考訳）: RoboAug: ロボットマニピュレーションのための領域コントラストデータ拡張による数百のシーンへの注釈
Authors: Xinhua Wang, Kun Wu, Zhen Zhao, Hu Cao, Yinuo Zhao, Zhiyuan Xu, Meng Li, Shichao Fan, Di Wu, Yixue Zhang, Ning Liu, Zhengping Che, Jian Tang,
Abstract要約: RoboAugは、大規模な事前トレーニングへの依存を最小限に抑える新しい生成データ拡張フレームワークである。 UR-5e、AgileX、Tien Kung 2.0という3つのロボットで、35k以上のロールアウトにまたがる大規模な実世界の実験を行います。実証的な結果は、RoboAugが最先端のデータ拡張ベースラインを大幅に上回っていることを示している。
参考スコア（独自算出の注目度）: 34.2367474351408
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enhancing the generalization capability of robotic learning to enable robots to operate effectively in diverse, unseen scenes is a fundamental and challenging problem. Existing approaches often depend on pretraining with large-scale data collection, which is labor-intensive and time-consuming, or on semantic data augmentation techniques that necessitate an impractical assumption of flawless upstream object detection in real-world scenarios. In this work, we propose RoboAug, a novel generative data augmentation framework that significantly minimizes the reliance on large-scale pretraining and the perfect visual recognition assumption by requiring only the bounding box annotation of a single image during training. Leveraging this minimal information, RoboAug employs pre-trained generative models for precise semantic data augmentation and integrates a plug-and-play region-contrastive loss to help models focus on task-relevant regions, thereby improving generalization and boosting task success rates. We conduct extensive real-world experiments on three robots, namely UR-5e, AgileX, and Tien Kung 2.0, spanning over 35k rollouts. Empirical results demonstrate that RoboAug significantly outperforms state-of-the-art data augmentation baselines. Specifically, when evaluating generalization capabilities in unseen scenes featuring diverse combinations of backgrounds, distractors, and lighting conditions, our method achieves substantial gains over the baseline without augmentation. The success rates increase from 0.09 to 0.47 on UR-5e, from 0.16 to 0.60 on AgileX, and from 0.19 to 0.67 on Tien Kung 2.0. These results highlight the superior generalization and effectiveness of RoboAug in real-world manipulation tasks. Our project is available at https://x-roboaug.github.io/.
Abstract（参考訳）: ロボット学習の一般化能力を強化して、多様な、見えない場面でロボットが効果的に動作できるようにすることは、根本的な、そして困難な問題である。既存のアプローチは、労働集約的で時間を要する大規模なデータ収集の事前トレーニングや、現実のシナリオにおいて、欠陥のない上流オブジェクト検出の非現実的な仮定を必要とするセマンティックデータ拡張技術に依存することが多い。本研究では,大規模事前学習への依存度を著しく低減し,トレーニング中の単一画像の境界ボックスアノテーションのみを必要とすることで,完全な視覚認識を前提とした新たな生成データ拡張フレームワークであるRoboAugを提案する。この最小限の情報を活用することで、RoboAugは、厳密なセマンティックデータ拡張のために事前訓練された生成モデルを採用し、プラグアンドプレイの領域競合損失を統合して、モデルがタスク関連リージョンに集中するのを手助けし、一般化を改善し、タスクの成功率を高める。 UR-5e、AgileX、Tien Kung 2.0という3つのロボットで、35k以上のロールアウトにまたがる大規模な実世界の実験を行います。実証的な結果は、RoboAugが最先端のデータ拡張ベースラインを大幅に上回っていることを示している。具体的には,背景,イントラクタ,照明条件の多彩な組み合わせを特徴とする未確認シーンにおける一般化能力の評価において,拡張を伴わないベースライン上での大幅な向上を実現する。成功率は UR-5e では 0.09 から 0.47 に、AgileX では 0.16 から 0.60 に、Tien Kung 2.0 では 0.19 から 0.67 に増加した。これらの結果は、実世界の操作タスクにおけるRoboAugの優れた一般化と有効性を強調している。私たちのプロジェクトはhttps://x-roboaug.github.io/で公開されています。

論文の概要: RoboAug: One Annotation to Hundreds of Scenes via Region-Contrastive Data Augmentation for Robotic Manipulation

関連論文リスト