Fugu-MT 論文翻訳(概要): RoboPCA: Pose-centered Affordance Learning from Human Demonstrations for Robot Manipulation

論文の概要: RoboPCA: Pose-centered Affordance Learning from Human Demonstrations for Robot Manipulation

arxiv url: http://arxiv.org/abs/2603.07691v1
Date: Sun, 08 Mar 2026 15:46:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.026352
Title: RoboPCA: Pose-centered Affordance Learning from Human Demonstrations for Robot Manipulation
Title（参考訳）: RoboPCA:ロボットマニピュレーションのための人間の実証から学ぶPose-centered Affordance Learning
Authors: Zhanqi Xiao, Ruiping Wang, Xilin Chen,
Abstract要約: RoboPCAはポーズ中心の価格予測フレームワークで、タスクに適した接触領域を共同で予測し、指示に応じてポーズを付ける。 Human2Affordは、シーンレベルの3D情報を自動的に復元し、人間のデモからポーズ中心の価格アノテーションを推論するデータキュレーションパイプラインである。 RoboPCAは、イメージデータセット、シミュレーション、実際のロボットのベースライン手法よりも優れており、タスクやカテゴリをまたいだ強力な一般化を示している。
参考スコア（独自算出の注目度）: 35.68205801897266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding spatial affordances -- comprising the contact regions of object interaction and the corresponding contact poses -- is essential for robots to effectively manipulate objects and accomplish diverse tasks. However, existing spatial affordance prediction methods mainly focus on locating the contact regions while delegating the pose to independent pose estimation approaches, which can lead to task failures due to inconsistencies between predicted contact regions and candidate poses. In this work, we propose RoboPCA, a pose-centered affordance prediction framework that jointly predicts task-appropriate contact regions and poses conditioned on instructions. To enable scalable data collection for pose-centered affordance learning, we devise Human2Afford, a data curation pipeline that automatically recovers scene-level 3D information and infers pose-centered affordance annotations from human demonstrations. With Human2Afford, scene depth and the interaction object's mask are extracted to provide 3D context and object localization, while pose-centered affordance annotations are obtained by tracking object points within the contact region and analyzing hand-object interaction patterns to establish a mapping from the 3D hand mesh to the robot end-effector orientation. By integrating geometry-appearance cues through an RGB-D encoder and incorporating mask-enhanced features to emphasize task-relevant object regions into the diffusion-based framework, RoboPCA outperforms baseline methods on image datasets, simulation, and real robots, and exhibits strong generalization across tasks and categories.
Abstract（参考訳）: ロボットがオブジェクトを効果的に操作し、多様なタスクを遂行するためには、空間的余裕(オブジェクトの相互作用の接触領域と対応する接触ポーズ)を理解することが不可欠である。しかし,既存の空間余剰予測手法は,ポーズを独立ポーズ推定アプローチに委譲しつつ,接触領域の配置に重点を置いており,これは予測された接触領域と候補ポーズとの矛盾によるタスクの失敗につながる可能性がある。本研究では,タスクに適した接触領域を共同で予測し,指示に照らしたポーズを示す,ポーズ中心のアベイランス予測フレームワークであるRoboPCAを提案する。ポーズ中心のアベイランス学習のためのスケーラブルなデータ収集を実現するために,シーンレベルの3D情報を自動的に復元し,ポーズ中心のアベイランスアノテーションを人間のデモから推論するデータキュレーションパイプラインであるHuman2Affordを考案した。また、Human2Affordでは、シーン深さとインタラクションオブジェクトのマスクを抽出して3Dコンテキストとオブジェクトローカライゼーションを提供し、また、接触領域内のオブジェクトポイントを追跡し、手動のインタラクションパターンを分析して、3Dハンドメッシュからロボットのエンドエフェクタ方向へのマッピングを確立することで、ポーズ中心のアプライアンスアノテーションを得る。 RGB-Dエンコーダを通じて幾何学的外観のキューを統合し、タスク関連オブジェクト領域を強調するマスク強化機能を拡散ベースフレームワークに組み込むことで、RoboPCAは画像データセット、シミュレーション、リアルロボットのベースライン手法より優れ、タスクやカテゴリ間で強力な一般化を示す。

論文の概要: RoboPCA: Pose-centered Affordance Learning from Human Demonstrations for Robot Manipulation

関連論文リスト