Temporally coherent video anonymization through GAN inpainting
- URL: http://arxiv.org/abs/2106.02328v1
- Date: Fri, 4 Jun 2021 08:19:44 GMT
- Title: Temporally coherent video anonymization through GAN inpainting
- Authors: Thangapavithraa Balaji, Patrick Blies, Georg G\"ori, Raphael Mitsch,
Marcel Wasserer, Torsten Sch\"on
- Abstract summary: This work tackles the problem of temporally coherent face anonymization in natural video streams.
We propose JaGAN, a two-stage system starting with detecting and masking out faces with black image patches in all individual frames of the video.
Our initial experiments reveal that image based generative models are not capable of inpainting patches showing temporal coherent appearance across neighboring video frames.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work tackles the problem of temporally coherent face anonymization in
natural video streams.We propose JaGAN, a two-stage system starting with
detecting and masking out faces with black image patches in all individual
frames of the video. The second stage leverages a privacy-preserving Video
Generative Adversarial Network designed to inpaint the missing image patches
with artificially generated faces. Our initial experiments reveal that image
based generative models are not capable of inpainting patches showing temporal
coherent appearance across neighboring video frames. To address this issue we
introduce a newly curated video collection, which is made publicly available
for the research community along with this paper. We also introduce the
Identity Invariance Score IdI as a means to quantify temporal coherency between
neighboring frames.
Related papers
- Video Diffusion Models are Strong Video Inpainter [14.402778136825642]
We propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI)
We propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code.
Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video.
arXiv Detail & Related papers (2024-08-21T08:01:00Z) - Kalman-Inspired Feature Propagation for Video Face Super-Resolution [78.84881180336744]
We introduce a novel framework to maintain a stable face prior to time.
The Kalman filtering principles offer our method a recurrent ability to use the information from previously restored frames to guide and regulate the restoration process of the current frame.
Experiments demonstrate the effectiveness of our method in capturing facial details consistently across video frames.
arXiv Detail & Related papers (2024-08-09T17:57:12Z) - DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with
Diffusion Auto-encoder [21.405442790474268]
We propose DiffDub: Diffusion-based dubbing.
We first craft the Diffusion auto-encoder by an inpainting incorporating a mask to delineate editable zones and unaltered regions.
To tackle these issues, we employ versatile strategies, including data augmentation and supplementary eye guidance.
arXiv Detail & Related papers (2023-11-03T09:41:51Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Siamese Masked Autoencoders [76.35448665609998]
We present Siamese Masked Autoencoders (SiamMAE) for learning visual correspondence from videos.
SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them.
It outperforms state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks.
arXiv Detail & Related papers (2023-05-23T17:59:46Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video
Editing [78.26925404508994]
We propose a unified temporally consistent facial video editing framework termed UniFaceGAN.
Our framework is designed to handle face swapping and face reenactment simultaneously.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2021-08-12T10:35:22Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.