Fugu-MT 論文翻訳(概要): PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos

論文の概要: PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos

arxiv url: http://arxiv.org/abs/2511.12935v2
Date: Tue, 18 Nov 2025 05:47:59 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 13:59:16.795475
Title: PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos
Title（参考訳）: PFAvatar: 現実のアウトフィット・オブ・ザ・デイの写真からパーソナライズされた3Dアバター
Authors: Dianbing Xi, Guoyuan An, Jingsen Zhu, Zhijian Liu, Yuan Liu, Ruiyuan Zhang, Jiayuan Lu, Yuchi Huo, Rui Wang,
Abstract要約: PFAvatarは、OOTD(Outfit of the Day)の写真から高品質な3Dアバターを再構築する新しい方法だ。従来の手法に比べて48倍の高速化を実現し, パーソナライズをわずか5分で完了した。
参考スコア（独自算出の注目度）: 24.8968050268664
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We propose PFAvatar (Pose-Fusion Avatar), a new method that reconstructs high-quality 3D avatars from Outfit of the Day(OOTD) photos, which exhibit diverse poses, occlusions, and complex backgrounds. Our method consists of two stages: (1) fine-tuning a pose-aware diffusion model from few-shot OOTD examples and (2) distilling a 3D avatar represented by a neural radiance field (NeRF). In the first stage, unlike previous methods that segment images into assets (e.g., garments, accessories) for 3D assembly, which is prone to inconsistency, we avoid decomposition and directly model the full-body appearance. By integrating a pre-trained ControlNet for pose estimation and a novel Condition Prior Preservation Loss (CPPL), our method enables end-to-end learning of fine details while mitigating language drift in few-shot training. Our method completes personalization in just 5 minutes, achieving a 48x speed-up compared to previous approaches. In the second stage, we introduce a NeRF-based avatar representation optimized by canonical SMPL-X space sampling and Multi-Resolution 3D-SDS. Compared to mesh-based representations that suffer from resolution-dependent discretization and erroneous occluded geometry, our continuous radiance field can preserve high-frequency textures (e.g., hair) and handle occlusions correctly through transmittance. Experiments demonstrate that PFAvatar outperforms state-of-the-art methods in terms of reconstruction fidelity, detail preservation, and robustness to occlusions/truncations, advancing practical 3D avatar generation from real-world OOTD albums. In addition, the reconstructed 3D avatar supports downstream applications such as virtual try-on, animation, and human video reenactment, further demonstrating the versatility and practical value of our approach.
Abstract（参考訳）: PFAvatar(Pose-Fusion Avatar)は,OOTD(Outfit of the Day)写真から高品質な3Dアバターを再構成し,様々なポーズ,オクルージョン,複雑な背景を示す手法である。提案手法は,(1)小ショットOOTD例からポーズ認識拡散モデルを微調整すること,(2)神経放射場(NeRF)で表される3Dアバターを蒸留すること,の2段階からなる。最初の段階では、イメージを3Dアセンブリのアセット(例えば衣服やアクセサリー)に分割する従来の手法とは異なり、不整合が生じやすいため、分解を回避し、全身の外観を直接モデル化する。ポーズ推定のための事前学習された制御ネットと、新しい条件事前保存損失(CPPL)を統合することで、言語ドリフトを軽減しつつ細部をエンド・ツー・エンドで学習することができる。従来の手法に比べて48倍の高速化を実現し, パーソナライズをわずか5分で完了した。第2段階では、標準SMPL-X空間サンプリングとマルチリゾリューション3D-SDSにより最適化されたNeRFベースのアバター表現を導入する。分解能に依存した離散化や誤閉塞幾何学に苦しむメッシュベースの表現と比較して,我々の連続放射場は高周波テクスチャ(例えば毛髪)を保存でき,透過性によってオクルージョンを正しく扱える。実験により、PFAvatarは、再現の忠実さ、細部保存、閉塞・切断に対する堅牢性において最先端の手法より優れており、実際のOOTDアルバムからの実用的な3Dアバター生成を推進していることが示された。さらに、再構成された3Dアバターは、仮想トライオン、アニメーション、人間のビデオ再現などの下流アプリケーションをサポートし、我々のアプローチの汎用性と実用的価値をさらに示す。

論文の概要: PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos

関連論文リスト