Stitch it in Time: GAN-Based Facial Editing of Real Videos
- URL: http://arxiv.org/abs/2201.08361v2
- Date: Fri, 21 Jan 2022 17:28:57 GMT
- Title: Stitch it in Time: GAN-Based Facial Editing of Real Videos
- Authors: Rotem Tzaban, Ron Mokady, Rinon Gal, Amit H. Bermano, Daniel Cohen-Or
- Abstract summary: We propose a framework for semantic editing of faces in videos, demonstrating significant improvements over the current state-of-the-art.
Our method produces meaningful face manipulations, maintains a higher degree of temporal consistency, and can be applied to challenging, high quality, talking head videos.
- Score: 38.81306268180105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of Generative Adversarial Networks to encode rich semantics
within their latent space has been widely adopted for facial image editing.
However, replicating their success with videos has proven challenging. Sets of
high-quality facial videos are lacking, and working with videos introduces a
fundamental barrier to overcome - temporal coherency. We propose that this
barrier is largely artificial. The source video is already temporally coherent,
and deviations from this state arise in part due to careless treatment of
individual components in the editing pipeline. We leverage the natural
alignment of StyleGAN and the tendency of neural networks to learn low
frequency functions, and demonstrate that they provide a strongly consistent
prior. We draw on these insights and propose a framework for semantic editing
of faces in videos, demonstrating significant improvements over the current
state-of-the-art. Our method produces meaningful face manipulations, maintains
a higher degree of temporal consistency, and can be applied to challenging,
high quality, talking head videos which current methods struggle with.
Related papers
- IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion [12.494492016414503]
Existing models encounter challenges such as poor editing quality, high computational costs and difficulties in preserving facial identity across diverse edits.
We propose a novel facial video editing framework that leverages the rich latent space of pre-trained text-to-image (T2I) diffusion models.
Our approach significantly reduces editing time by 80%, while maintaining temporal consistency throughout the video sequence.
arXiv Detail & Related papers (2025-01-13T18:08:27Z) - SVFR: A Unified Framework for Generalized Video Face Restoration [86.17060212058452]
Face Restoration (FR) is a crucial area within image and video processing, focusing on reconstructing high-quality portraits from degraded inputs.
We propose a novel approach for the Generalized Video Face Restoration task, which integrates video BFR, inpainting, and colorization tasks.
This work advances the state-of-the-art in video FR and establishes a new paradigm for generalized video face restoration.
arXiv Detail & Related papers (2025-01-02T12:51:20Z) - Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency [36.939731355462264]
This study proposes a novel and efficient blind video face enhancement method.
It restores high-quality videos from their compressed low-quality versions with an effective de-flickering mechanism.
Experiments conducted on the VFHQ-Test dataset demonstrate that our method surpasses the current state-of-the-art blind face video restoration and de-flickering methods on both efficiency and effectiveness.
arXiv Detail & Related papers (2024-11-25T15:14:36Z) - GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking
Face Generation [71.73912454164834]
A modern talking face generation method is expected to achieve the goals of generalized audio-lip synchronization, good video quality, and high system efficiency.
NeRF has become a popular technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video.
We propose GeneFace++ to handle these challenges by utilizing the rendering pitch contour as an auxiliary feature and introducing a temporal loss in the facial motion prediction process.
arXiv Detail & Related papers (2023-05-01T12:24:09Z) - Diffusion Video Autoencoders: Toward Temporally Consistent Face Video
Editing via Disentangled Video Encoding [35.18070525015657]
We propose a novel face video editing framework based on diffusion autoencoders.
Our model is based on diffusion models and can satisfy both reconstruction and edit capabilities at the same time.
arXiv Detail & Related papers (2022-12-06T07:41:51Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video
Editing [78.26925404508994]
We propose a unified temporally consistent facial video editing framework termed UniFaceGAN.
Our framework is designed to handle face swapping and face reenactment simultaneously.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2021-08-12T10:35:22Z) - Task-agnostic Temporally Consistent Facial Video Editing [84.62351915301795]
We propose a task-agnostic, temporally consistent facial video editing framework.
Based on a 3D reconstruction model, our framework is designed to handle several editing tasks in a more unified and disentangled manner.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2020-07-03T02:49:20Z) - Head2Head++: Deep Facial Attributes Re-Targeting [6.230979482947681]
We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment.
We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos.
Our system performs end-to-end reenactment in nearly real-time speed (18 fps)
arXiv Detail & Related papers (2020-06-17T23:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.