Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
- URL: http://arxiv.org/abs/2403.16510v1
- Date: Mon, 25 Mar 2024 07:54:18 GMT
- Title: Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
- Authors: Ziyao Huang, Fan Tang, Yong Zhang, Xiaodong Cun, Juan Cao, Jintao Li, Tong-Yee Lee,
- Abstract summary: Make-Your-Anchor is a system requiring only a one-minute video clip of an individual for training.
We finetune a proposed structure-guided diffusion model on input video to render 3D mesh conditions into human appearances.
A novel identity-specific face enhancement module is introduced to improve the visual quality of facial regions in the output videos.
- Score: 33.46782517803435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the remarkable process of talking-head-based avatar-creating solutions, directly generating anchor-style videos with full-body motions remains challenging. In this study, we propose Make-Your-Anchor, a novel system necessitating only a one-minute video clip of an individual for training, subsequently enabling the automatic generation of anchor-style videos with precise torso and hand movements. Specifically, we finetune a proposed structure-guided diffusion model on input video to render 3D mesh conditions into human appearances. We adopt a two-stage training strategy for the diffusion model, effectively binding movements with specific appearances. To produce arbitrary long temporal video, we extend the 2D U-Net in the frame-wise diffusion model to a 3D style without additional training cost, and a simple yet effective batch-overlapped temporal denoising module is proposed to bypass the constraints on video length during inference. Finally, a novel identity-specific face enhancement module is introduced to improve the visual quality of facial regions in the output videos. Comparative experiments demonstrate the effectiveness and superiority of the system in terms of visual quality, temporal coherence, and identity preservation, outperforming SOTA diffusion/non-diffusion methods. Project page: \url{https://github.com/ICTMCG/Make-Your-Anchor}.
Related papers
- Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts [41.08576055846111]
Stereo-Talker is a novel one-shot audio-driven human video synthesis system.
It generates 3D talking videos with precise lip synchronization, expressive body gestures, temporally consistent photo-realistic quality, and continuous viewpoint control.
arXiv Detail & Related papers (2024-10-31T11:32:33Z) - VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide [48.22321420680046]
VideoGuide is a novel framework that enhances the temporal consistency of pretrained text-to-video (T2V) models.
It improves temporal quality by interpolating the guiding model's denoised samples into the sampling model's denoising process.
The proposed method brings about significant improvement in temporal consistency and image fidelity.
arXiv Detail & Related papers (2024-10-06T05:46:17Z) - TVG: A Training-free Transition Video Generation Method with Diffusion Models [12.037716102326993]
Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives.
Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationship modeling and abrupt content changes.
We propose a novel training-free Transition Video Generation (TVG) approach using video-level diffusion models that addresses these limitations without additional training.
arXiv Detail & Related papers (2024-08-24T00:33:14Z) - COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing [57.76170824395532]
Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video.
We propose COrrespondence-guided Video Editing (COVE) to achieve high-quality and consistent video editing.
COVE can be seamlessly integrated into the pre-trained T2I diffusion model without the need for extra training or optimization.
arXiv Detail & Related papers (2024-06-13T06:27:13Z) - Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution.
We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z) - Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data [80.92268916571712]
A critical bottleneck is the scarcity of high-quality 3D objects with detailed captions.
We propose Bootstrap3D, a novel framework that automatically generates an arbitrary quantity of multi-view images.
We have generated 1 million high-quality synthetic multi-view images with dense descriptive captions.
arXiv Detail & Related papers (2024-05-31T17:59:56Z) - Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models [48.56724784226513]
We propose Customize-A-Video that models the motion from a single reference video and adapts it to new subjects and scenes with both spatial and temporal varieties.
The proposed modules are trained in a staged pipeline and inferred in a plug-and-play fashion, enabling easy extensions to various downstream tasks.
arXiv Detail & Related papers (2024-02-22T18:38:48Z) - DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and
View-Change Human-Centric Video Editing [48.086102360155856]
We introduce the dynamic Neural Radiance Fields (NeRF) as the innovative video representation.
We propose the image-based video-NeRF editing pipeline with a set of innovative designs to provide consistent and controllable editing.
Our method, dubbed as DynVideo-E, significantly outperforms SOTA approaches on two challenging datasets by a large margin of 50% 95% for human preference.
arXiv Detail & Related papers (2023-10-16T17:48:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.