ReenactNet: Real-time Full Head Reenactment
- URL: http://arxiv.org/abs/2006.10500v1
- Date: Fri, 22 May 2020 00:51:38 GMT
- Title: ReenactNet: Real-time Full Head Reenactment
- Authors: Mohammad Rami Koujan, Michail Christos Doukas, Anastasios Roussos,
Stefanos Zafeiriou
- Abstract summary: We propose a head-to-head system capable of fully transferring the human head 3D pose, facial expressions and eye gaze from a source to a target actor.
Our system produces high-fidelity, temporally-smooth and photo-realistic synthetic videos faithfully transferring the human time-varying head attributes from the source to the target actor.
- Score: 50.32988828989691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video-to-video synthesis is a challenging problem aiming at learning a
translation function between a sequence of semantic maps and a photo-realistic
video depicting the characteristics of a driving video. We propose a
head-to-head system of our own implementation capable of fully transferring the
human head 3D pose, facial expressions and eye gaze from a source to a target
actor, while preserving the identity of the target actor. Our system produces
high-fidelity, temporally-smooth and photo-realistic synthetic videos
faithfully transferring the human time-varying head attributes from the source
to the target actor. Our proposed implementation: 1) works in real time ($\sim
20$ fps), 2) runs on a commodity laptop with a webcam as the only input, 3) is
interactive, allowing the participant to drive a target person, e.g. a
celebrity, politician, etc, instantly by varying their expressions, head pose,
and eye gaze, and visualising the synthesised video concurrently.
Related papers
- Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis [76.72505510632904]
We present Total-Recon, the first method to reconstruct deformable scenes from long monocular RGBD videos.
Our method hierarchically decomposes the scene into the background and objects, whose motion is decomposed into root-body motion and local articulations.
arXiv Detail & Related papers (2023-04-24T17:59:52Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [12.552355581481999]
We first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps.
The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space.
In the second stage, we learn facial dynamics and motions from the projected audio features.
In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings.
arXiv Detail & Related papers (2021-09-22T08:47:43Z) - FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute
Learning [23.14865405847467]
We propose a talking face generation method that takes an audio signal as input and a short target video clip as reference.
The method synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.
Experimental results and user studies show our method can generate realistic talking face videos with better qualities than the results of state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T02:10:26Z) - Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
arXiv Detail & Related papers (2020-08-11T22:28:48Z) - Face2Face: Real-time Face Capture and Reenactment of RGB Videos [66.38142459175191]
Face2Face is a novel approach for real-time facial reenactment of a monocular target video sequence.
We track facial expressions of both source and target video using a dense photometric consistency measure.
We convincingly re-render the synthesized target face on top of the corresponding video stream.
arXiv Detail & Related papers (2020-07-29T12:47:16Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.