MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars
- URL: http://arxiv.org/abs/2510.12785v1
- Date: Tue, 14 Oct 2025 17:56:14 GMT
- Title: MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars
- Authors: Felix Taubner, Ruihang Zhang, Mathieu Tuli, Sherwin Bahmani, David B. Lindell,
- Abstract summary: We build a video model that generates animatable multi-view videos of digital humans based on a single reference image and target expressions.<n>Our model, MVP4D, is based on a state-of-the-art pre-trained video diffusion model and generates hundreds of frames simultaneously from viewpoints varying by up to 360 degrees around a target subject.
- Score: 18.907017120867827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Digital human avatars aim to simulate the dynamic appearance of humans in virtual environments, enabling immersive experiences across gaming, film, virtual reality, and more. However, the conventional process for creating and animating photorealistic human avatars is expensive and time-consuming, requiring large camera capture rigs and significant manual effort from professional 3D artists. With the advent of capable image and video generation models, recent methods enable automatic rendering of realistic animated avatars from a single casually captured reference image of a target subject. While these techniques significantly lower barriers to avatar creation and offer compelling realism, they lack constraints provided by multi-view information or an explicit 3D representation. So, image quality and realism degrade when rendered from viewpoints that deviate strongly from the reference image. Here, we build a video model that generates animatable multi-view videos of digital humans based on a single reference image and target expressions. Our model, MVP4D, is based on a state-of-the-art pre-trained video diffusion model and generates hundreds of frames simultaneously from viewpoints varying by up to 360 degrees around a target subject. We show how to distill the outputs of this model into a 4D avatar that can be rendered in real-time. Our approach significantly improves the realism, temporal consistency, and 3D consistency of generated avatars compared to previous methods.
Related papers
- Generalizable and Animatable 3D Full-Head Gaussian Avatar from a Single Image [9.505520774467263]
Building 3D animatable head avatars from a single image is an important yet challenging problem.<n>Existing methods generally collapse under large camera pose variations, compromising the realism of 3D avatars.<n>We propose a new framework to tackle the novel setting of one-shot 3D full-head animatable avatar reconstruction in a single feed-forward pass.
arXiv Detail & Related papers (2026-01-19T06:56:58Z) - AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion [56.12859795754579]
AdaHuman is a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image.<n>AdaHuman incorporates two key innovations: a pose-conditioned 3D joint diffusion model and a compositional 3DGS refinement module.
arXiv Detail & Related papers (2025-05-30T17:59:54Z) - EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z) - Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior [31.780579293685797]
We present Vid2Avatar-Pro, a method to create photorealistic and animatable 3D human avatars from monocular in-the-wild videos.
arXiv Detail & Related papers (2025-03-03T14:45:35Z) - Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars [60.0866477932976]
We present Avat3r, which regresses a high-quality and animatable 3D head avatar from just a few input images.<n>We make Large Reconstruction Models animatable and learn a powerful prior over 3D human heads from a large multi-view video dataset.<n>We increase robustness by feeding input images with different expressions to our model during training, enabling the reconstruction of 3D head avatars from inconsistent inputs.
arXiv Detail & Related papers (2025-02-27T16:00:11Z) - CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models [9.622857933809067]
CAP4D is an approach that uses a morphable multi-view diffusion model to reconstruct photoreal 4D portrait avatars from any number of reference images.<n>Our approach demonstrates state-of-the-art performance for single-, few-, and multi-image 4D portrait avatar reconstruction.
arXiv Detail & Related papers (2024-12-16T18:58:51Z) - AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction [26.82525451095629]
We propose a robust method for 3D reconstruction of inconsistent images, enabling real-time rendering during inference.<n>We recast the reconstruction problem as a 4D task and introduce an efficient 3D modeling approach using 4D Gaussian Splatting.<n>Experiments demonstrate that our method achieves photorealistic, real-time animation of 3D human avatars from in-the-wild images.
arXiv Detail & Related papers (2024-12-03T18:55:39Z) - DreamWaltz: Make a Scene with Complex 3D Animatable Avatars [68.49935994384047]
We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior.
For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses.
arXiv Detail & Related papers (2023-05-21T17:59:39Z) - AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints.
To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space.
Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z) - High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation [117.32310997522394]
3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR.
Existing person-specific 3D models are not robust to lighting, hence their results typically miss subtle facial behaviors and cause artifacts in the avatar.
This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.
arXiv Detail & Related papers (2021-03-29T18:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.