Fugu-MT 論文翻訳(概要): Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

論文の概要: Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

arxiv url: http://arxiv.org/abs/2604.07517v1
Date: Wed, 08 Apr 2026 18:52:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.516654
Title: Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations
Title（参考訳）: Grasp as you Dream: Imitating functional Grasping from Generated Human Demonstrations
Authors: Chao Tang, Jiacheng Xu, Haofei Lu, Bolin Zou, Wenlong Dong, Hong Zhang, Danica Kragic,
Abstract要約: 本稿では、労働集約的なデータ収集を伴わずにゼロショット機能把握を可能にするGraspDreamerを提案する。鍵となるアイデアは、VGMがインターネットスケールの人間のデータに基づいて事前訓練されていることだ。 GraspDreamerの優れたデータ効率と一般化性能を示す。
参考スコア（独自算出の注目度）: 23.294838162593184
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building generalist robots capable of performing functional grasping in everyday, open-world environments remains a significant challenge due to the vast diversity of objects and tasks. Existing methods are either constrained to narrow object/task sets or rely on prohibitively large-scale data collection to capture real-world variability. In this work, we present an alternative approach, GraspDreamer, a method that leverages human demonstrations synthesized by visual generative models (VGMs) (e.g., video generation models) to enable zero-shot functional grasping without labor-intensive data collection. The key idea is that VGMs pre-trained on internet-scale human data implicitly encode generalized priors about how humans interact with the physical world, which can be combined with embodiment-specific action optimization to enable functional grasping with minimal effort. Extensive experiments on the public benchmarks with different robot hands demonstrate the superior data efficiency and generalization performance of GraspDreamer compared to previous methods. Real-world evaluations further validate the effectiveness on real robots. Additionally, we showcase that GraspDreamer can (1) be naturally extended to downstream manipulation tasks, and (2) can generate data to support visuomotor policy learning.
Abstract（参考訳）: 日常的でオープンな環境で機能的な把握を行うことのできる汎用ロボットの構築は、オブジェクトやタスクの多様さのため、依然として大きな課題である。既存のメソッドは、狭いオブジェクト/タスクセットに制約されるか、あるいは現実世界の変動を捉えるために、違法に大規模なデータ収集に依存している。本研究では、視覚生成モデル(VGM)によって合成された人間のデモ(例えばビデオ生成モデル)を利用して、労働集約的なデータ収集なしにゼロショット機能把握を可能にする方法であるGraspDreamerを提案する。鍵となるアイデアは、VGMがインターネットスケールの人間のデータに基づいて事前訓練され、人間が物理的世界とどのように相互作用するかという、暗黙的に先入観をコード化していることだ。ロボットハンドの違いによる公開ベンチマークの大規模な実験により,GraspDreamerのデータ効率と一般化性能が従来の方法よりも優れていることが示された。実世界評価は、実ロボットの有効性をさらに検証する。さらに,GraspDreamerは(1)下流操作タスクに自然に拡張することができ,(2)ビジュモータポリシー学習を支援するデータを生成することができることを示す。

論文の概要: Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

関連論文リスト