PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
- URL: http://arxiv.org/abs/2508.09973v1
- Date: Wed, 13 Aug 2025 17:40:48 GMT
- Title: PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
- Authors: Geonhee Sim, Gyeongsik Moon,
- Abstract summary: Two major approaches exist for creating animatable human avatars.<n>A 3D-based approach achieves personalization through a disentangled identity representation.<n>A diffusion-based approach learns pose-driven deformations from large-scale in-the-wild videos but struggles with identity preservation.<n>We present PERSONA, a framework that combines the strengths of both approaches to obtain a personalized 3D human avatar.
- Score: 17.76649311703262
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Two major approaches exist for creating animatable human avatars. The first, a 3D-based approach, optimizes a NeRF- or 3DGS-based avatar from videos of a single person, achieving personalization through a disentangled identity representation. However, modeling pose-driven deformations, such as non-rigid cloth deformations, requires numerous pose-rich videos, which are costly and impractical to capture in daily life. The second, a diffusion-based approach, learns pose-driven deformations from large-scale in-the-wild videos but struggles with identity preservation and pose-dependent identity entanglement. We present PERSONA, a framework that combines the strengths of both approaches to obtain a personalized 3D human avatar with pose-driven deformations from a single image. PERSONA leverages a diffusion-based approach to generate pose-rich videos from the input image and optimizes a 3D avatar based on them. To ensure high authenticity and sharp renderings across diverse poses, we introduce balanced sampling and geometry-weighted optimization. Balanced sampling oversamples the input image to mitigate identity shifts in diffusion-generated training videos. Geometry-weighted optimization prioritizes geometry constraints over image loss, preserving rendering quality in diverse poses.
Related papers
- Hyper Diffusion Avatars: Dynamic Human Avatar Generation using Network Weight Space Diffusion [45.88321772203678]
We propose a novel approach that unites the strengths of person-specific rendering and diffusion-based generative modeling.<n>Our method follows a two-stage pipeline: first, we optimize a set of person-specific UNets, with each network representing a dynamic human avatar.<n>During inference, our method generates network weights for real-time, controllable rendering of dynamic human avatars.
arXiv Detail & Related papers (2025-09-04T12:15:55Z) - PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images [23.745241278910946]
PF-LHM is a large human reconstruction model that generates high-quality 3D avatars in seconds from one or multiple casually captured pose-free images.<n>Our method unifies single- and multi-image 3D human reconstruction, achieving high-fidelity and animatable 3D human avatars without requiring camera and human pose annotations.
arXiv Detail & Related papers (2025-06-16T17:59:56Z) - EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z) - FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images [74.86864398919467]
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images.<n>We learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization.<n>Our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos.
arXiv Detail & Related papers (2025-03-24T23:20:47Z) - DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses [57.17501809717155]
We present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs.<n>Our key insight is that human images naturally exhibit multiple levels of correlation.<n>We construct the TikTok-Dance5K dataset, comprising 5K high-quality dance videos with detailed frame annotations.
arXiv Detail & Related papers (2024-11-30T08:42:13Z) - AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos [31.904839609743448]
Existing multi-view methods often face challenges in estimating the 3D pose and shape of multiple closely interacting people.
We propose a novel method leveraging the personalized implicit neural avatar of each individual as a prior.
Our experimental results demonstrate state-of-the-art performance on several public datasets.
arXiv Detail & Related papers (2024-08-04T18:41:35Z) - Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z) - Self-Supervised 3D Human Pose Estimation in Static Video Via Neural
Rendering [5.568218439349004]
Inferring 3D human pose from 2D images is a challenging and long-standing problem in the field of computer vision.
We present preliminary results for a method to estimate 3D pose from 2D video containing a single person.
arXiv Detail & Related papers (2022-10-10T09:24:07Z) - AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints.
To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space.
Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.