Fugu-MT 論文翻訳(概要): DisPOSE: Projected Polystochastic Diffusion for Self-Supervised Multi-View 3D Human Pose Estimation

論文の概要: DisPOSE: Projected Polystochastic Diffusion for Self-Supervised Multi-View 3D Human Pose Estimation

arxiv url: http://arxiv.org/abs/2606.07419v2
Date: Mon, 08 Jun 2026 08:28:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:05.077302
Title: DisPOSE: Projected Polystochastic Diffusion for Self-Supervised Multi-View 3D Human Pose Estimation
Title（参考訳）: DisPOSE: 自己監督型多視点3次元人物位置推定のための多面体拡散予測
Authors: Tony Danjun Wang, Tolga Birdal, Nassir Navab, Lennart Bastian,
Abstract要約: DisPOSEは、本質的に離散的な多視点人物割り当て問題を近似する自己教師型フレームワークである。特定可能なシンクホーン射影を用いることにより、本モデルは有効かつ実現可能な課題への解の導出を学ぶ。提案手法は、標準データセット上での最先端の自己教師手法よりも優れている。
参考スコア（独自算出の注目度）: 58.47973015036709
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recovering 3D human poses for multiple individuals from different camera views is a fundamental bottleneck for analyzing interacting behaviors. Existing self-supervised approaches leverage synthetic catalogues of 3D poses; however, this leads to poor generalization in real-world scenarios due to distribution shifts. We therefore introduce DisPOSE, a self-supervised framework that approximates the inherently discrete multi-view person-assignment problem as a generative diffusion process over the space of polystochastic tensors. By employing differentiable Sinkhorn projections during denoising, our model learns to guide solutions toward valid and feasible assignments based on 2D image priors. The complete 3D skeletons of localized individuals are then regressed using a Hypergraph-Convolutional Decoder that explicitly models relational structures and articulated joints across multiple views. The proposed approach outperforms current state-of-the-art self-supervised methods on standard datasets and demonstrates strong performance on a newly proposed benchmark featuring highly occluded scenes from surgical operating rooms. Our diffusion-based localization demonstrates high label efficiency, retaining 99% of its performance with only 10% of the pseudo-labels. Notably, disentangling the assignment and root regression components while maintaining differentiability makes DisPOSE nearly agnostic to different camera arrangements.
Abstract（参考訳）: 異なるカメラビューから複数の個人に対する3D人間のポーズを復元することは、相互作用する振る舞いを分析するための基本的なボトルネックである。既存の自己教師型アプローチは3次元ポーズの合成カタログを利用するが、これは分布シフトによる現実シナリオの一般化が不十分になる。そこで本研究では,多面体テンソル空間上の生成拡散過程として,本質的に離散的な多視点人物割当問題を近似した自己教師型フレームワークであるDisPOSEを紹介する。特定可能なシンクホーンプロジェクションを用いることで、2次元画像の先行値に基づく有効かつ実現可能な課題への解の導出を学習する。局所化された個体の完全な3D骨格はハイパーグラフ・畳み込みデコーダを用いて回帰され、複数のビューにまたがる関係構造と関節を明示的にモデル化する。提案手法は,現在最先端の自己管理手法を標準データセットで上回る性能を示し,手術室のシーンを高度に隠蔽した新たなベンチマークで高い性能を示す。拡散型ローカライゼーションは高いラベル効率を示し,99%の性能を維持し,10%の擬似ラベルしか保持していない。特に、割り当てとルート回帰コンポーネントを識別性を維持しながら切り離すことで、DisPOSEは異なるカメラ配置にほとんど依存しない。

論文の概要: DisPOSE: Projected Polystochastic Diffusion for Self-Supervised Multi-View 3D Human Pose Estimation

関連論文リスト