UMA: Ultra-detailed Human Avatars via Multi-level Surface Alignment
- URL: http://arxiv.org/abs/2506.01802v1
- Date: Mon, 02 Jun 2025 15:42:33 GMT
- Title: UMA: Ultra-detailed Human Avatars via Multi-level Surface Alignment
- Authors: Heming Zhu, Guoxing Sun, Christian Theobalt, Marc Habermann,
- Abstract summary: Learning an animatable and clothed human avatar model with vivid dynamics and photorealistic appearance from multi-view videos is an important foundational research problem in computer graphics and vision.<n>We propose a latent deformation model and supervising the 3D deformation of the animatable character using guidance from foundational 2D video point trackers.<n>Our approach demonstrates significantly improved performance in rendering quality and geometric accuracy over the prior state of the art.
- Score: 55.0783220713185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning an animatable and clothed human avatar model with vivid dynamics and photorealistic appearance from multi-view videos is an important foundational research problem in computer graphics and vision. Fueled by recent advances in implicit representations, the quality of the animatable avatars has achieved an unprecedented level by attaching the implicit representation to drivable human template meshes. However, they usually fail to preserve the highest level of detail, particularly apparent when the virtual camera is zoomed in and when rendering at 4K resolution and higher. We argue that this limitation stems from inaccurate surface tracking, specifically, depth misalignment and surface drift between character geometry and the ground truth surface, which forces the detailed appearance model to compensate for geometric errors. To address this, we propose a latent deformation model and supervising the 3D deformation of the animatable character using guidance from foundational 2D video point trackers, which offer improved robustness to shading and surface variations, and are less prone to local minima than differentiable rendering. To mitigate the drift over time and lack of 3D awareness of 2D point trackers, we introduce a cascaded training strategy that generates consistent 3D point tracks by anchoring point tracks to the rendered avatar, which ultimately supervises our avatar at the vertex and texel level. To validate the effectiveness of our approach, we introduce a novel dataset comprising five multi-view video sequences, each over 10 minutes in duration, captured using 40 calibrated 6K-resolution cameras, featuring subjects dressed in clothing with challenging texture patterns and wrinkle deformations. Our approach demonstrates significantly improved performance in rendering quality and geometric accuracy over the prior state of the art.
Related papers
- TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z) - DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses [57.17501809717155]
We present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs.<n>Our key insight is that human images naturally exhibit multiple levels of correlation.<n>We construct the TikTok-Dance5K dataset, comprising 5K high-quality dance videos with detailed frame annotations.
arXiv Detail & Related papers (2024-11-30T08:42:13Z) - AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos [31.904839609743448]
Existing multi-view methods often face challenges in estimating the 3D pose and shape of multiple closely interacting people.
We propose a novel method leveraging the personalized implicit neural avatar of each individual as a prior.
Our experimental results demonstrate state-of-the-art performance on several public datasets.
arXiv Detail & Related papers (2024-08-04T18:41:35Z) - TriHuman : A Real-time and Controllable Tri-plane Representation for
Detailed Human Geometry and Appearance Synthesis [76.73338151115253]
TriHuman is a novel human-tailored, deformable, and efficient tri-plane representation.
We non-rigidly warp global ray samples into our undeformed tri-plane texture space.
We show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes.
arXiv Detail & Related papers (2023-12-08T16:40:38Z) - GETAvatar: Generative Textured Meshes for Animatable Human Avatars [69.56959932421057]
We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality geometries and textures.
We propose GETAvatar, a Generative model that directly generates Explicit Textured 3D rendering for animatable human Avatar.
arXiv Detail & Related papers (2023-10-04T10:30:24Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.