Fugu-MT 論文翻訳(概要): Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies

論文の概要: Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies

arxiv url: http://arxiv.org/abs/2602.11885v1
Date: Thu, 12 Feb 2026 12:34:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.393207
Title: Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies
Title（参考訳）: あらゆるものを操作するための学習 - バウンディングボックスガイドによるデータスケーリング法則の探求
Authors: Yihao Wu, Jinming Ma, Junbo Tan, Yanzhao Yu, Shoujie Li, Mingliang Zhou, Diyun Xiang, Xueqian Wang,
Abstract要約: 拡散に基づくポリシーは意味操作における限定的な一般化を示す。本稿では,対象オブジェクトを直接指定するためのバウンディングボックス命令を活用することを提案する。本稿では,オブジェクト検出とバウンディングボックス誘導拡散ポリシを統合したセマンティック・モーション・デカップリング・フレームワークを提案する。
参考スコア（独自算出の注目度）: 17.654568478379307
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion-based policies show limited generalization in semantic manipulation, posing a key obstacle to the deployment of real-world robots. This limitation arises because relying solely on text instructions is inadequate to direct the policy's attention toward the target object in complex and dynamic environments. To solve this problem, we propose leveraging bounding-box instruction to directly specify target object, and further investigate whether data scaling laws exist in semantic manipulation tasks. Specifically, we design a handheld segmentation device with an automated annotation pipeline, Label-UMI, which enables the efficient collection of demonstration data with semantic labels. We further propose a semantic-motion-decoupled framework that integrates object detection and bounding-box guided diffusion policy to improve generalization and adaptability in semantic manipulation. Throughout extensive real-world experiments on large-scale datasets, we validate the effectiveness of the approach, and reveal a power-law relationship between generalization performance and the number of bounding-box objects. Finally, we summarize an effective data collection strategy for semantic manipulation, which can achieve 85\% success rates across four tasks on both seen and unseen objects. All datasets and code will be released to the community.
Abstract（参考訳）: 拡散に基づくポリシーは、セマンティック操作において限定的な一般化を示し、現実世界のロボットの展開に重要な障害となる。この制限は、複雑な動的環境下でターゲットオブジェクトにポリシーの注意を向けるのに、テキスト命令のみに依存するのが不十分であるために生じる。そこで本研究では,対象オブジェクトを直接指定するためのバウンディングボックス命令を活用することを提案し,セマンティックな操作タスクにデータスケーリング法則が存在するかどうかをさらに検討する。具体的には,自動アノテーションパイプラインを備えたハンドヘルドセグメンテーションデバイスであるLabel-UMIを設計し,セマンティックラベルを用いた効率的なデモデータの収集を可能にする。さらに、オブジェクト検出とバウンディングボックス誘導拡散ポリシーを統合し、セマンティックな操作における一般化と適応性を改善するセマンティック・モーション・デカップリング・フレームワークを提案する。大規模データセットに関する大規模な実世界実験を通じて,本手法の有効性を検証し,一般化性能とバウンディングボックスオブジェクト数との関係を明らかにする。最後に、セマンティックな操作のための効果的なデータ収集戦略を要約し、目に見えないオブジェクトと見えないオブジェクトの両方で4つのタスクで85%の成功率を達成する。すべてのデータセットとコードはコミュニティにリリースされます。

論文の概要: Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies

関連論文リスト