Related papers: FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

URL: http://arxiv.org/abs/2504.15179v1
Date: Mon, 21 Apr 2025 15:40:14 GMT
Title: FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image
Authors: Fei Yin, Mallikarjun B R, Chun-Han Yao, Rafał Mantiuk, Varun Jampani,
Abstract summary: We present a novel framework for generating high-quality, animatable 4D avatar from a single image.<n>Our method achieves superior quality compared to the prior art, while maintaining consistency across different viewpoints and expressions.
Score: 41.598551483524666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel framework for generating high-quality, animatable 4D avatar from a single image. While recent advances have shown promising results in 4D avatar creation, existing methods either require extensive multiview data or struggle with shape accuracy and identity consistency. To address these limitations, we propose a comprehensive system that leverages shape, image, and video priors to create full-view, animatable avatars. Our approach first obtains initial coarse shape through 3D-GAN inversion. Then, it enhances multiview textures using depth-guided warping signals for cross-view consistency with the help of the image diffusion model. To handle expression animation, we incorporate a video prior with synchronized driving signals across viewpoints. We further introduce a Consistent-Inconsistent training to effectively handle data inconsistencies during 4D reconstruction. Experimental results demonstrate that our method achieves superior quality compared to the prior art, while maintaining consistency across different viewpoints and expressions.

Related papers

GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction [49.70452913749897]
We propose a novel framework that leverages geometry-aware diffusion to learn strong geometry priors for high-fidelity head avatar reconstruction.<n>Our approach jointly synthesizes portrait images and corresponding surface normals, while a pose-free expression captures implicit expression representations.<n>Our method substantially outperforms state-of-the-art approaches in visual quality, expression fidelity, and cross-identity generalization.
arXiv Detail & Related papers (2026-02-27T16:41:21Z)
Human Video Generation from a Single Image with 3D Pose and View Control [62.676151243249556]
We present Human Video Generation in 4D (HVG), a latent video diffusion model capable of generating high-quality multi-view,temporally coherent human videos from a single image.<n>HVG achieves this through three key designs: (i) Articulated Pose Modulation, which captures the anatomical relationships of 3D joints via a novel dual-dimensional bone map and resolves self-occlusions across views by introducing 3D information; (ii) View and Temporal Alignment, which ensures multi-view consistency and alignment between a reference image and pose sequences for frame-to-frame stability; and (iii)
arXiv Detail & Related papers (2026-02-24T18:42:20Z)
NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation [12.213398557667443]
We introduce NOVA3D, an innovative single-image-to-3D generation framework.<n>Our key insight lies in leveraging strong 3D priors from a pretrained video diffusion model.<n>To facilitate information exchange between color and geometric domains, we propose the Geometry-Temporal Alignment (GTA) attention mechanism.<n>We also introduce the de-conflict geometry fusion algorithm, which improves texture fidelity by addressing multi-view inaccuracies.
arXiv Detail & Related papers (2025-06-09T12:37:46Z)
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation [0.0]
High-quality animatable 3D human avatars from a single image remains a significant challenge in computer vision.<n>We present SVAD, a novel approach that addresses these limitations by leveraging complementary strengths of existing techniques.<n>Our method generates synthetic training data through video diffusion, enhances it with identity preservation and image restoration modules, and utilizes this refined data to train 3DGS avatars.
arXiv Detail & Related papers (2025-05-08T17:59:58Z)
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z)
Bringing Objects to Life: 4D generation from 3D objects [31.533802484121182]
We introduce a method for animating user-provided 3D objects by conditioning on textual prompts to guide 4D generation. Our method achieves up to threefold improvements in identity preservation measured using LPIPS scores.
arXiv Detail & Related papers (2024-12-29T10:12:01Z)
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction [26.82525451095629]
We propose a robust method for 3D reconstruction of inconsistent images, enabling real-time rendering during inference. We recast the reconstruction problem as a 4D task and introduce an efficient 3D modeling approach using 4D Gaussian Splatting. Experiments demonstrate that our method achieves photorealistic, real-time animation of 3D human avatars from in-the-wild images.
arXiv Detail & Related papers (2024-12-03T18:55:39Z)
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses [57.17501809717155]
We present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs.<n>Our key insight is that human images naturally exhibit multiple levels of correlation.<n>We construct the TikTok-Dance5K dataset, comprising 5K high-quality dance videos with detailed frame annotations.
arXiv Detail & Related papers (2024-11-30T08:42:13Z)
EG4D: Explicit Generation of 4D Object without Score Distillation [105.63506584772331]
DG4D is a novel framework that generates high-quality and consistent 4D assets without score distillation. Our framework outperforms the baselines in generation quality by a considerable margin.
arXiv Detail & Related papers (2024-05-28T12:47:22Z)
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models [6.738732514502613]
Diffusion$2$ is a novel framework for dynamic 3D content creation. It reconciles the knowledge about geometric consistency and temporal smoothness from 3D models to directly sample dense multi-view images. Experiments demonstrate the efficacy of our proposed framework in generating highly seamless and consistent 4D assets.
arXiv Detail & Related papers (2024-04-02T17:58:03Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.<n>Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.<n>Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars. We leverage the SMPL model to provide shape and pose guidance for the generation. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.