Fugu-MT 論文翻訳(概要): ComPose: When to Trust Hands for Object Pose Tracking

論文の概要: ComPose: When to Trust Hands for Object Pose Tracking

arxiv url: http://arxiv.org/abs/2605.23523v1
Date: Fri, 22 May 2026 11:39:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.331394
Title: ComPose: When to Trust Hands for Object Pose Tracking
Title（参考訳）: ComPose: オブジェクトのトラッキングを信頼する時
Authors: Jisu Shin, Junoh Lee, JunGyu Lee, Inhwan Bae, Dohyeon Lee, Hokyun Im, Youngwoon Lee, Hae-Gon Jeon,
Abstract要約: ComPoseは、6DoFオブジェクトトラッキングフレームワークで、RGBビデオから手動でオブジェクトのポーズを推定するように設計されている。本手法は,物体追跡のためのテキスト補完キューとして手の動きを調和させる。結果は、ロボットがオンラインビデオから人間の行動を再構築できるようにすることで、下流ロボット操作に効果的に移行することができる。
参考スコア（独自算出の注目度）: 44.085148707189035
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reconstructing the motion of objects from videos is a key component for embodied AI and robot manipulation. While diverse approaches to object pose tracking have been studied, they rely heavily on strong external priors, such as depth data or 3D templates, and remain highly vulnerable to severe occlusions by hand grasps despite the use of explicit masks. In this work, we present ComPose, a 6DoF object tracking framework designed for hand-aware object pose estimation from RGB video. Rather than treating the hand purely as an occluder, our method harmonizes hand motions as a \textit{complementary cue} for object tracking. In detail, we recover a variety of object motions over time by combining object and hand cues from foundation models within a unified tracking pipeline. Here, ComPose adaptively selects informative hand joints, combines object- and hand-derived cues for motion estimation, and refines the resulting object motion using visible geometric evidence and a learned correction. We further enforce the temporal consistency over both rotation and translation, yielding stable 3D object trajectories over time without any external smoothing. Extensive experiments show that our method is accurate, efficient, and robust under severe hand occlusion and geometric ambiguity. In addition, the resulting trajectories can also effectively transfer to downstream robot manipulation by enabling robots to reconstruct human actions from online videos.
Abstract（参考訳）: ビデオからオブジェクトの動きを再構成することは、AIとロボット操作を具体化する上で重要な要素である。物体のポーズ追跡に対する様々なアプローチが研究されているが、深度データや3Dテンプレートのような強い外部の先行概念に強く依存しており、明示的なマスクの使用にもかかわらず、手のつかみによる重篤な閉塞に対して非常に脆弱である。本研究では,RGBビデオから手書きオブジェクトのポーズ推定を行うための6DoFオブジェクト追跡フレームワークComPoseを提案する。本手法では,手の動きを物体追跡のための「textit{complementary cue}」として調和させる。本稿では,統合されたトラッキングパイプライン内の基礎モデルからオブジェクトとハンドキューを組み合わせることで,時間とともにさまざまなオブジェクトの動きを復元する。そこでComPoseは適応的に情報的手関節を選択し、物体と手の動きを合成し、視覚的な幾何学的証拠と学習された補正を用いて物体の動きを洗練する。さらに、回転と変換の両面に時間的一貫性を強制し、外部の平滑化を伴わずに時間とともに安定な3次元物体軌道を導出する。広範囲な実験により,本手法は手の重篤な閉塞と幾何学的曖昧さの下で正確で,効率的で,頑健であることが示された。さらに、結果として得られる軌道は、ロボットがオンラインビデオから人間の行動を再構築できるようにすることで、下流ロボット操作に効果的に移行することができる。

論文の概要: ComPose: When to Trust Hands for Object Pose Tracking

関連論文リスト