Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
- URL: http://arxiv.org/abs/2503.15851v2
- Date: Tue, 25 Mar 2025 04:56:40 GMT
- Title: Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
- Authors: Zhenglin Zhou, Fan Ma, Hehe Fan, Tat-Seng Chua,
- Abstract summary: We propose Zero-1-to-A, a robust method that synthesizes a spatial and temporal consistency dataset for 4D avatar reconstruction.<n>Experiments demonstrate that Zero-1-to-A improves fidelity, animation quality, and rendering speed compared to existing diffusion-based methods.
- Score: 61.938480115119596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Animatable head avatar generation typically requires extensive data for training. To reduce the data requirements, a natural solution is to leverage existing data-free static avatar generation methods, such as pre-trained diffusion models with score distillation sampling (SDS), which align avatars with pseudo ground-truth outputs from the diffusion model. However, directly distilling 4D avatars from video diffusion often leads to over-smooth results due to spatial and temporal inconsistencies in the generated video. To address this issue, we propose Zero-1-to-A, a robust method that synthesizes a spatial and temporal consistency dataset for 4D avatar reconstruction using the video diffusion model. Specifically, Zero-1-to-A iteratively constructs video datasets and optimizes animatable avatars in a progressive manner, ensuring that avatar quality increases smoothly and consistently throughout the learning process. This progressive learning involves two stages: (1) Spatial Consistency Learning fixes expressions and learns from front-to-side views, and (2) Temporal Consistency Learning fixes views and learns from relaxed to exaggerated expressions, generating 4D avatars in a simple-to-complex manner. Extensive experiments demonstrate that Zero-1-to-A improves fidelity, animation quality, and rendering speed compared to existing diffusion-based methods, providing a solution for lifelike avatar creation. Code is publicly available at: https://github.com/ZhenglinZhou/Zero-1-to-A.
Related papers
- 3D$^2$-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling [37.11454674584874]
We introduce 3D$2$-Actor, a pose-conditioned 3D-aware human modeling pipeline that integrates 2D denoising and 3D rectifying steps.
Experimental results demonstrate that 3D$2$-Actor excels in high-fidelity avatar modeling and robustly generalizes to novel poses.
arXiv Detail & Related papers (2024-12-16T09:37:52Z) - ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance [27.1886214162329]
We propose ConsistentAvatar, a novel framework for fully consistent and high-fidelity talking avatar generation.
Our method learns to first model the temporal representation for stability between adjacent frames.
Extensive experiments demonstrate that ConsistentAvatar outperforms the state-of-the-art methods on the generated appearance, 3D, expression and temporal consistency.
arXiv Detail & Related papers (2024-11-23T03:43:09Z) - Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction [97.924188608301]
We propose a novel single-stage 3D diffusion model, DiffusionGS, for object generation and scene reconstruction from a single view.<n>DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency.<n>Experiments show that DiffusionGS yields improvements of 2.20 dB/23.25 and 1.34 dB/19.16 in PSNR/FID for objects and scenes.
arXiv Detail & Related papers (2024-11-21T18:21:24Z) - DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text.
The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation.
Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z) - Fine-gained Zero-shot Video Sampling [21.42513407755273]
We propose a novel Zero-Shot video sampling algorithm, denoted as $mathcalZS2$.
$mathcalZS2$ is capable of directly sampling high-quality video clips without any training or optimization.
It achieves state-of-the-art performance in zero-shot video generation, occasionally outperforming recent supervised methods.
arXiv Detail & Related papers (2024-07-31T09:36:58Z) - UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures [80.047065473698]
We propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting.
We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments.
arXiv Detail & Related papers (2024-01-20T01:55:17Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - Learning 3D Photography Videos via Self-supervised Diffusion on Single
Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects.
Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints.
We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.