Fugu-MT 論文翻訳(概要): AugLift: Boosting Generalization in Lifting-based 3D Human Pose Estimation

論文の概要: AugLift: Boosting Generalization in Lifting-based 3D Human Pose Estimation

arxiv url: http://arxiv.org/abs/2508.07112v1
Date: Sat, 09 Aug 2025 22:36:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.706324
Title: AugLift: Boosting Generalization in Lifting-based 3D Human Pose Estimation
Title（参考訳）: AugLift:リフティングに基づく3次元人文推定における一般化の促進
Authors: Nikolai Warner, Wenjin Zhang, Irfan Essa, Apaar Sadhwani,
Abstract要約: 検出された2Dキーポイントから3Dポーズを予測する方法は、しばしば新しいデータセットや実世界の設定に悪影響を及ぼす。我々は,データ収集やセンサの追加を必要とせずに,一般化性能を大幅に向上する標準リフトパイプラインの再構成であるemphAugLiftを提案する。 AugLiftはモジュラーアドオンとして機能し、既存のリフトアーキテクチャに簡単に統合できる。
参考スコア（独自算出の注目度）: 12.127052057927182
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lifting-based methods for 3D Human Pose Estimation (HPE), which predict 3D poses from detected 2D keypoints, often generalize poorly to new datasets and real-world settings. To address this, we propose \emph{AugLift}, a simple yet effective reformulation of the standard lifting pipeline that significantly improves generalization performance without requiring additional data collection or sensors. AugLift sparsely enriches the standard input -- the 2D keypoint coordinates $(x, y)$ -- by augmenting it with a keypoint detection confidence score $c$ and a corresponding depth estimate $d$. These additional signals are computed from the image using off-the-shelf, pre-trained models (e.g., for monocular depth estimation), thereby inheriting their strong generalization capabilities. Importantly, AugLift serves as a modular add-on and can be readily integrated into existing lifting architectures. Our extensive experiments across four datasets demonstrate that AugLift boosts cross-dataset performance on unseen datasets by an average of $10.1\%$, while also improving in-distribution performance by $4.0\%$. These gains are consistent across various lifting architectures, highlighting the robustness of our method. Our analysis suggests that these sparse, keypoint-aligned cues provide robust frame-level context, offering a practical way to significantly improve the generalization of any lifting-based pose estimation model. Code will be made publicly available.
Abstract（参考訳）: 検出された2Dキーポイントから3Dポーズを予測する3Dヒューマンポース推定(HPE)のリフティングベースの手法は、しばしば新しいデータセットや実世界の設定に悪影響を及ぼす。そこで本研究では,データ収集やセンサの追加を必要とせず,一般化性能を大幅に向上させる,標準的なリフトパイプラインの簡易かつ効果的な再構成法である \emph{AugLift} を提案する。 AugLiftは、キーポイント検出信頼度スコア$c$と対応する深さ推定値$d$で拡張することで、標準入力 -- 2Dキーポイント座標の$(x, y)$ -- をわずかに強化する。これらの追加信号は、オフザシェルフ、事前訓練されたモデル(例えば、単眼深度推定)を用いて画像から計算され、強い一般化能力を継承する。重要なことに、AugLiftはモジュラーアドオンとして機能し、既存のリフトアーキテクチャに簡単に統合できる。 4つのデータセットにわたる広範な実験により、AugLiftは、目に見えないデータセットのクロスデータセットのパフォーマンスを平均10.1\%$で向上し、また、配信中のパフォーマンスを4.0\%$で改善することを示した。これらの利得は、様々なリフトアーキテクチャ間で一貫しており、我々の手法の堅牢性を強調している。分析の結果,これらの疎結合なキーポイントアライメントキューはフレームレベルの頑健なコンテキストを提供し,持ち上げ型ポーズ推定モデルの一般化を著しく改善する実用的な方法であることがわかった。コードは公開されます。

論文の概要: AugLift: Boosting Generalization in Lifting-based 3D Human Pose Estimation

関連論文リスト