Related papers: Temporally coherent video anonymization through GAN inpainting

Temporally coherent video anonymization through GAN inpainting

URL: http://arxiv.org/abs/2106.02328v1
Date: Fri, 4 Jun 2021 08:19:44 GMT
Title: Temporally coherent video anonymization through GAN inpainting
Authors: Thangapavithraa Balaji, Patrick Blies, Georg G\"ori, Raphael Mitsch, Marcel Wasserer, Torsten Sch\"on
Abstract summary: This work tackles the problem of temporally coherent face anonymization in natural video streams. We propose JaGAN, a two-stage system starting with detecting and masking out faces with black image patches in all individual frames of the video. Our initial experiments reveal that image based generative models are not capable of inpainting patches showing temporal coherent appearance across neighboring video frames.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work tackles the problem of temporally coherent face anonymization in natural video streams.We propose JaGAN, a two-stage system starting with detecting and masking out faces with black image patches in all individual frames of the video. The second stage leverages a privacy-preserving Video Generative Adversarial Network designed to inpaint the missing image patches with artificially generated faces. Our initial experiments reveal that image based generative models are not capable of inpainting patches showing temporal coherent appearance across neighboring video frames. To address this issue we introduce a newly curated video collection, which is made publicly available for the research community along with this paper. We also introduce the Identity Invariance Score IdI as a means to quantify temporal coherency between neighboring frames.

Related papers

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping [43.30061680192465]
We present the first diffusion-based framework specifically designed for video face swapping. Our approach incorporates a specially designed diffusion model coupled with a VidFaceVAE. Our framework achieves superior performance in identity preservation, temporal consistency, and visual quality compared to existing methods.
arXiv Detail & Related papers (2024-12-15T18:58:32Z)
SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models [10.66567645920237]
Given an input video of a person and a new garment, the objective of this paper is to synthesize a new video where the person is wearing the garment while maintaining temporal consistency. We reconceptualize video virtual try-on as a conditional video inpainting task, with garments serving as input conditions. Specifically, our approach enhances image diffusion models by incorporating temporal attention layers to improve temporal coherence.
arXiv Detail & Related papers (2024-12-13T14:50:26Z)
Video Diffusion Models are Strong Video Inpainter [14.402778136825642]
We propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI) We propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code. Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video.
arXiv Detail & Related papers (2024-08-21T08:01:00Z)
Kalman-Inspired Feature Propagation for Video Face Super-Resolution [78.84881180336744]
We introduce a novel framework to maintain a stable face prior to time. The Kalman filtering principles offer our method a recurrent ability to use the information from previously restored frames to guide and regulate the restoration process of the current frame. Experiments demonstrate the effectiveness of our method in capturing facial details consistently across video frames.
arXiv Detail & Related papers (2024-08-09T17:57:12Z)
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder [21.405442790474268]
We propose DiffDub: Diffusion-based dubbing. We first craft the Diffusion auto-encoder by an inpainting incorporating a mask to delineate editable zones and unaltered regions. To tackle these issues, we employ versatile strategies, including data augmentation and supplementary eye guidance.
arXiv Detail & Related papers (2023-11-03T09:41:51Z)
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images. Existing methods invert video frames individually often leading to undesired inconsistent results over time. We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID) Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z)
Siamese Masked Autoencoders [76.35448665609998]
We present Siamese Masked Autoencoders (SiamMAE) for learning visual correspondence from videos. SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them. It outperforms state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks.
arXiv Detail & Related papers (2023-05-23T17:59:46Z)
Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation. Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z)
UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing [78.26925404508994]
We propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Our framework is designed to handle face swapping and face reenactment simultaneously. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2021-08-12T10:35:22Z)
Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.