Fugu-MT 論文翻訳(概要): MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation

論文の概要: MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation

arxiv url: http://arxiv.org/abs/2510.10434v1
Date: Sun, 12 Oct 2025 03:57:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.9374
Title: MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation
Title（参考訳）: MonoSE(3)-Diffusion:ロバストカメラとロボットの姿勢推定のための単眼SE(3)拡散フレームワーク
Authors: Kangjian Zhu, Haobo Jiang, Yigong Zhang, Jianjun Qian, Jian Yang, Jin Xie,
Abstract要約: ロボットのポーズ推定のための条件付き denoising diffusion framework である MonoSE(3)-Diffusion を提案する。フレームワークは2つのプロセスから構成される: 多様なポーズ増強のための可視性制約付き拡散プロセスと、ポーズ改善のためのタイムステップ対応の逆プロセスである。我々のアプローチは、最先端のベンチマーク(DREAMとRoboKeyGen)で32.3%のアップを示している。
参考スコア（独自算出の注目度）: 39.15285671441867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose MonoSE(3)-Diffusion, a monocular SE(3) diffusion framework that formulates markerless, image-based robot pose estimation as a conditional denoising diffusion process. The framework consists of two processes: a visibility-constrained diffusion process for diverse pose augmentation and a timestep-aware reverse process for progressive pose refinement. The diffusion process progressively perturbs ground-truth poses to noisy transformations for training a pose denoising network. Importantly, we integrate visibility constraints into the process, ensuring the transformations remain within the camera field of view. Compared to the fixed-scale perturbations used in current methods, the diffusion process generates in-view and diverse training poses, thereby improving the network generalization capability. Furthermore, the reverse process iteratively predicts the poses by the denoising network and refines pose estimates by sampling from the diffusion posterior of current timestep, following a scheduled coarse-to-fine procedure. Moreover, the timestep indicates the transformation scales, which guide the denoising network to achieve more accurate pose predictions. The reverse process demonstrates higher robustness than direct prediction, benefiting from its timestep-aware refinement scheme. Our approach demonstrates improvements across two benchmarks (DREAM and RoboKeyGen), achieving a notable AUC of 66.75 on the most challenging dataset, representing a 32.3% gain over the state-of-the-art.
Abstract（参考訳）: 本研究では,無マーカーで画像に基づくロボットのポーズ推定を条件付き denoising 拡散過程として定式化する単分子SE(3)拡散フレームワーク MonoSE(3)-Diffusion を提案する。フレームワークは2つのプロセスから構成される: 多様なポーズ強化のための可視性制約付き拡散プロセスと、プログレッシブポーズ改善のためのタイムステップ対応の逆プロセスである。拡散過程は、着地構造が徐々に乱れていくと、ポーズ認知ネットワークを訓練するためのノイズ変換に反応する。重要なことは、可視性制約をプロセスに統合し、変換がカメラの視野内にあることを保証することである。現在の方法で用いられる固定スケール摂動と比較して、拡散過程はインビューおよび多様なトレーニングポーズを生成し、ネットワークの一般化能力を向上させる。さらに、逆処理は、デノナイジングネットワークによるポーズを反復的に予測し、スケジュールされた粗大な手順に従って、現在の時間ステップの拡散後からサンプリングしてポーズ推定を洗練する。さらに、このタイムステップは、より正確なポーズ予測を実現するためにデノナイジングネットワークを誘導する変換スケールを示す。逆のプロセスは直接予測よりも堅牢性が高く、タイムステップ対応の改良スキームの恩恵を受けている。我々のアプローチは、2つのベンチマーク(DREAMとRoboKeyGen)で改善を示し、最も困難なデータセットで66.75のAUCを達成した。

論文の概要: MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation

関連論文リスト