PVP: Personalized Video Prior for Editable Dynamic Portraits using
StyleGAN
- URL: http://arxiv.org/abs/2306.17123v1
- Date: Thu, 29 Jun 2023 17:26:51 GMT
- Title: PVP: Personalized Video Prior for Editable Dynamic Portraits using
StyleGAN
- Authors: Kai-En Lin and Alex Trevithick and Keli Cheng and Michel Sarkis and
Mohsen Ghafoorian and Ning Bi and Gerhard Reitmayr and Ravi Ramamoorthi
- Abstract summary: StyleGAN has shown promising results in photorealistic and accurate reconstruction of human faces.
In this work, our goal is to take as input a monocular video of a face, and create an editable dynamic portrait.
The user can create novel viewpoints, edit the appearance, and animate the face.
- Score: 33.49053731211931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Portrait synthesis creates realistic digital avatars which enable users to
interact with others in a compelling way. Recent advances in StyleGAN and its
extensions have shown promising results in synthesizing photorealistic and
accurate reconstruction of human faces. However, previous methods often focus
on frontal face synthesis and most methods are not able to handle large head
rotations due to the training data distribution of StyleGAN. In this work, our
goal is to take as input a monocular video of a face, and create an editable
dynamic portrait able to handle extreme head poses. The user can create novel
viewpoints, edit the appearance, and animate the face. Our method utilizes
pivotal tuning inversion (PTI) to learn a personalized video prior from a
monocular video sequence. Then we can input pose and expression coefficients to
MLPs and manipulate the latent vectors to synthesize different viewpoints and
expressions of the subject. We also propose novel loss functions to further
disentangle pose and expression in the latent space. Our algorithm shows much
better performance over previous approaches on monocular video datasets, and it
is also capable of running in real-time at 54 FPS on an RTX 3080.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - SPARK: Self-supervised Personalized Real-time Monocular Face Capture [6.093606972415841]
Current state of the art approaches have the ability to regress parametric 3D face models in real-time across a wide range of identities.
We propose a method for high-precision 3D face capture taking advantage of a collection of unconstrained videos of a subject as prior information.
arXiv Detail & Related papers (2024-09-12T12:30:04Z) - MyPortrait: Morphable Prior-Guided Personalized Portrait Generation [19.911068375240905]
Myportrait is a simple, general, and flexible framework for neural portrait generation.
Our proposed framework supports both video-driven and audio-driven face animation.
Our method provides a real-time online version and a high-quality offline version.
arXiv Detail & Related papers (2023-12-05T12:05:01Z) - GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar [48.21353924040671]
We propose to learn person-specific animatable avatars from images without assuming to have access to precise facial expression tracking.
We learn a mapping from 3DMM facial expression parameters to the latent space of the generative model.
With this scheme, we decouple 3D appearance reconstruction and animation control to achieve high fidelity in image synthesis.
arXiv Detail & Related papers (2023-11-22T19:13:00Z) - Controllable Dynamic Appearance for Neural 3D Portraits [54.29179484318194]
We propose CoDyNeRF, a system that enables the creation of fully controllable 3D portraits in real-world capture conditions.
CoDyNeRF learns to approximate illumination dependent effects via a dynamic appearance model.
We demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls.
arXiv Detail & Related papers (2023-09-20T02:24:40Z) - Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis [124.48519390371636]
Transfering human motion from a source to a target person poses great potential in computer vision and graphics applications.
Previous work has either relied on crafted 3D human models or trained a separate model specifically for each target person.
This work studies a more general setting, in which we aim to learn a single model to parsimoniously transfer motion from a source video to any target person.
arXiv Detail & Related papers (2021-10-27T03:42:41Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.