Fugu-MT 論文翻訳(概要): SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

論文の概要: SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

arxiv url: http://arxiv.org/abs/2604.26620v1
Date: Wed, 29 Apr 2026 12:45:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-30 15:59:36.399781
Title: SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses
Title（参考訳）: SnapPose3D: 拡散型単一フレーム2D-to-3Dリフティング
Authors: Alessandro Simoni, Riccardo Catalini, Davide Di Nucci, Guido Borghi, Davide Davoli, Lorenzo Garattoni, Gianpiero Francesca, Yuki Kawana, Roberto Vezzani,
Abstract要約: SnapPose3Dは、視覚的コンテキストと2Dポーズの両方で条件付けられた3Dポーズを識別するために、決定論的に訓練されたポーズリフトフレームワークである。本研究では,SnapPose3Dを3次元ポーズ推定タスクのよく知られたベンチマークで広く評価する。
参考スコア（独自算出の注目度）: 46.076819133044076
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Depth ambiguity and joint uncertainty are the two main obstacles in obtaining accurate human pose predictions by 2D-to-3D lifting methods proposed in the literature. In particular, these issues are caused by 2D joint locations that can be mapped to multiple 3D positions, inducing multiple possible final poses. Following these considerations, we propose leveraging diffusion-based models generation capability to predict multiple hypotheses and aggregate them in a final accurate pose. Therefore, we introduce SnapPose3D, a pose-lifting framework trained deterministically to denoise 3D poses conditioned on both visual context and 2D pose features. SnapPose3D adopts a probabilistic approach during inference, generating multiple hypotheses through random sampling from a unit Gaussian distribution. Unlike most previous methods that address pose ambiguity by processing temporal sequences, SnapPose3D uses single frames as input, avoiding tracking and limiting computational cost, data acquisition complexity, and the need for online, real-time applications. We extensively evaluate SnapPose3D on well-known benchmarks for the 3D human pose estimation task showing its ability to generate and aggregate accurate hypotheses that lead to state-of-the-art results.
Abstract（参考訳）: 文献で提案されている2D-to-3Dリフト法により, 人間のポーズ予測の精度を高める上で, 奥行きの曖昧さと関節の不確実性は2つの主要な障害である。特に、これらの問題は複数の3D位置にマッピング可能な2次元関節位置によって引き起こされ、複数の最終的なポーズが引き起こされる。これらの考察に続いて、拡散モデル生成機能を利用して複数の仮説を予測し、最終的な正確なポーズでそれらを集約する手法を提案する。そこで我々は,視覚的コンテキストと2Dポーズの両方で条件付けられた3Dポーズを識別するために,決定論的に訓練されたポーズリフトフレームワークであるSnapPose3Dを紹介した。 SnapPose3Dは推論中に確率的アプローチを採用し、単位ガウス分布からランダムサンプリングによって複数の仮説を生成する。 SnapPose3Dは、時間的シーケンスを処理することであいまいさに対処する従来の方法とは異なり、単一のフレームを入力として使用し、追跡と計算コストの制限、データ取得の複雑さ、オンラインリアルタイムアプリケーションの必要性を回避している。本研究では,SnapPose3Dを3次元ポーズ推定タスクのよく知られたベンチマークで評価し,精度の高い仮説を生成・集約し,その結果を得た。

論文の概要: SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

関連論文リスト