PV3D: A 3D Generative Model for Portrait Video Generation
- URL: http://arxiv.org/abs/2212.06384v3
- Date: Wed, 21 Jun 2023 02:13:41 GMT
- Title: PV3D: A 3D Generative Model for Portrait Video Generation
- Authors: Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai,
Jiashi Feng, Mike Zheng Shou
- Abstract summary: We propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos.
PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing.
- Score: 94.96025739097922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in generative adversarial networks (GANs) have demonstrated
the capabilities of generating stunning photo-realistic portrait images. While
some prior works have applied such image GANs to unconditional 2D portrait
video generation and static 3D portrait synthesis, there are few works
successfully extending GANs for generating 3D-aware portrait videos. In this
work, we propose PV3D, the first generative framework that can synthesize
multi-view consistent portrait videos. Specifically, our method extends the
recent static 3D-aware image GAN to the video domain by generalizing the 3D
implicit neural representation to model the spatio-temporal space. To introduce
motion dynamics to the generation process, we develop a motion generator by
stacking multiple motion layers to generate motion features via modulated
convolution. To alleviate motion ambiguities caused by camera/human motions, we
propose a simple yet effective camera condition strategy for PV3D, enabling
both temporal and multi-view consistent video generation. Moreover, PV3D
introduces two discriminators for regularizing the spatial and temporal domains
to ensure the plausibility of the generated portrait videos. These elaborated
designs enable PV3D to generate 3D-aware motion-plausible portrait videos with
high-quality appearance and geometry, significantly outperforming prior works.
As a result, PV3D is able to support many downstream applications such as
animating static portraits and view-consistent video motion editing. Code and
models are released at https://showlab.github.io/pv3d.
Related papers
- Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - Splatter a Video: Video Gaussian Representation for Versatile Processing [48.9887736125712]
Video representation is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing.
We introduce a novel explicit 3D representation-video Gaussian representation -- that embeds a video into 3D Gaussians.
It has been proven effective in numerous video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereoscopic video generation.
arXiv Detail & Related papers (2024-06-19T22:20:03Z) - Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion [3.545941891218148]
We investigate whether it is necessary to explicitly enforce multiview consistency over time, as current approaches do, or if it is sufficient for a model to generate 3D representations of each timestep independently.
We propose a model, Vid3D, that leverages 2D video diffusion to generate 3D videos by first generating a 2D "seed" of the video's temporal dynamics and then independently generating a 3D representation for each timestep in the seed video.
arXiv Detail & Related papers (2024-06-17T04:09:04Z) - OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation [0.0]
One image to editable dynamic 3D model and video generation is novel direction and change in the research area of single image to 3D representation or 3D reconstruction of image.
We propose the OneTo3D, a method and theory to used one single image to generate the editable 3D model and generate the targeted semantic continuous time-unlimited 3D video.
arXiv Detail & Related papers (2024-05-10T15:44:11Z) - V3D: Video Diffusion Models are Effective 3D Generators [19.33837029942662]
We introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation.
Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image.
Our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views.
arXiv Detail & Related papers (2024-03-11T14:03:36Z) - Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [88.17520303867099]
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio.
We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model.
Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
arXiv Detail & Related papers (2024-01-16T17:04:30Z) - HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks [101.36230756743106]
This paper is inspired by the success of 3D-aware GANs that bridge 2D and 3D domains with 3D fields as the intermediate representation for rendering 2D images.
We propose a novel method, dubbed HyperStyle3D, based on 3D-aware GANs for 3D portrait stylization.
arXiv Detail & Related papers (2023-04-19T07:22:05Z) - 3D-Aware Video Generation [149.5230191060692]
We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos.
By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
arXiv Detail & Related papers (2022-06-29T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.