Face2Face: Real-time Face Capture and Reenactment of RGB Videos
- URL: http://arxiv.org/abs/2007.14808v1
- Date: Wed, 29 Jul 2020 12:47:16 GMT
- Title: Face2Face: Real-time Face Capture and Reenactment of RGB Videos
- Authors: Justus Thies and Michael Zollh\"ofer and Marc Stamminger and Christian
Theobalt and Matthias Nie{\ss}ner
- Abstract summary: Face2Face is a novel approach for real-time facial reenactment of a monocular target video sequence.
We track facial expressions of both source and target video using a dense photometric consistency measure.
We convincingly re-render the synthesized target face on top of the corresponding video stream.
- Score: 66.38142459175191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Face2Face, a novel approach for real-time facial reenactment of a
monocular target video sequence (e.g., Youtube video). The source sequence is
also a monocular video stream, captured live with a commodity webcam. Our goal
is to animate the facial expressions of the target video by a source actor and
re-render the manipulated output video in a photo-realistic fashion. To this
end, we first address the under-constrained problem of facial identity recovery
from monocular video by non-rigid model-based bundling. At run time, we track
facial expressions of both source and target video using a dense photometric
consistency measure. Reenactment is then achieved by fast and efficient
deformation transfer between source and target. The mouth interior that best
matches the re-targeted expression is retrieved from the target sequence and
warped to produce an accurate fit. Finally, we convincingly re-render the
synthesized target face on top of the corresponding video stream such that it
seamlessly blends with the real-world illumination. We demonstrate our method
in a live setup, where Youtube videos are reenacted in real time.
Related papers
- Identity-Preserving Talking Face Generation with Landmark and Appearance
Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos.
We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures.
Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Copy Motion From One to Another: Fake Motion Video Generation [53.676020148034034]
A compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion.
Current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos.
We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image.
Our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.
arXiv Detail & Related papers (2022-05-03T08:45:22Z) - Head2Head++: Deep Facial Attributes Re-Targeting [6.230979482947681]
We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment.
We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos.
Our system performs end-to-end reenactment in nearly real-time speed (18 fps)
arXiv Detail & Related papers (2020-06-17T23:38:37Z) - ReenactNet: Real-time Full Head Reenactment [50.32988828989691]
We propose a head-to-head system capable of fully transferring the human head 3D pose, facial expressions and eye gaze from a source to a target actor.
Our system produces high-fidelity, temporally-smooth and photo-realistic synthetic videos faithfully transferring the human time-varying head attributes from the source to the target actor.
arXiv Detail & Related papers (2020-05-22T00:51:38Z) - Everybody's Talkin': Let Me Talk as You Want [134.65914135774605]
We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video.
It does not assume a person-specific rendering network yet capable of translating arbitrary source audio into arbitrary video output.
arXiv Detail & Related papers (2020-01-15T09:54:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.