Related papers: Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos

Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos

URL: http://arxiv.org/abs/2601.11635v1
Date: Wed, 14 Jan 2026 09:42:44 GMT
Title: Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos
Authors: Anil Egin, Andrea Tangherloni, Antitza Dantcheva,
Abstract summary: Face video anonymization is aimed at privacy preservation while allowing for the analysis of videos in a number of computer vision downstream tasks.<n>We propose here a novel unified framework referred to as Anon-NET, streamlined to de-identify facial videos.<n>We inpaint faces by a diffusion-based generative model guided by high-level recognition and motion-aware attribute expression transfer.
Score: 8.859607428705846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Face video anonymization is aimed at privacy preservation while allowing for the analysis of videos in a number of computer vision downstream tasks such as expression recognition, people tracking, and action recognition. We propose here a novel unified framework referred to as Anon-NET, streamlined to de-identify facial videos, while preserving age, gender, race, pose, and expression of the original video. Specifically, we inpaint faces by a diffusion-based generative model guided by high-level attribute recognition and motion-aware expression transfer. We then animate deidentified faces by video-driven animation, which accepts the de-identified face and the original video as input. Extensive experiments on the datasets VoxCeleb2, CelebV-HQ, and HDTF, which include diverse facial dynamics, demonstrate the effectiveness of AnonNET in obfuscating identity while retaining visual realism and temporal consistency. The code of AnonNet will be publicly released.

Related papers

BLANKET: Anonymizing Faces in Infant Video Recordings [3.049887057143419]
BLANKET is a novel approach designed to anonymize infant faces in video recordings while preserving essential facial attributes.<n>The method is evaluated on a dataset of short video recordings of babies and is compared to the popular anonymization method, DeepPrivacy2.
arXiv Detail & Related papers (2025-12-17T15:49:56Z)
DirectSwap: Mask-Free Cross-Identity Training and Benchmarking for Expression-Consistent Video Head Swapping [58.2549561389375]
Video head swapping aims to replace the entire head of a video subject, including facial identity, head shape, and hairstyle, with that of a reference image.<n>Due to the lack of ground-truth paired swapping data, prior methods typically train on cross-frame pairs of the same person within a video.<n>We propose DirectSwap, a mask-free, direct video head-swapping framework that extends an image U-Net into a video diffusion model.
arXiv Detail & Related papers (2025-12-10T08:31:28Z)
Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation [54.52905471078152]
We propose a mask-free talking face generation approach while maintaining the 2D-based face editing task.<n>We transform the input images to have closed mouths, using a two-step landmark-based approach trained in an unpaired manner.
arXiv Detail & Related papers (2025-07-28T16:03:36Z)
AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation [49.4220768835379]
AdaMesh is a novel adaptive speech-driven facial animation approach.<n>It learns the personalized talking style from a reference video of about 10 seconds.<n>It generates vivid facial expressions and head poses.
arXiv Detail & Related papers (2023-10-11T06:56:08Z)
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos. We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures. Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z)
GANonymization: A GAN-based Face Anonymization Framework for Preserving Emotional Expressions [43.017036538109274]
GANonymization is a novel face anonymization framework with facial expression-preserving abilities. Our approach is based on a high-level representation of a face, which is synthesized into an anonymized version based on a generative adversarial network (GAN)
arXiv Detail & Related papers (2023-05-03T14:22:48Z)
Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor. We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video. We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z)
Facial Expression Video Generation Based-On Spatio-temporal Convolutional GAN: FEV-GAN [1.279257604152629]
We present a novel approach for generating videos of the six basic facial expressions. Our approach is based on Spatio-temporal Conal GANs, that are known to model both content and motion in the same network. The code and the pre-trained model will soon be made publicly available.
arXiv Detail & Related papers (2022-10-20T11:54:32Z)
Temporally coherent video anonymization through GAN inpainting [0.0]
This work tackles the problem of temporally coherent face anonymization in natural video streams. We propose JaGAN, a two-stage system starting with detecting and masking out faces with black image patches in all individual frames of the video. Our initial experiments reveal that image based generative models are not capable of inpainting patches showing temporal coherent appearance across neighboring video frames.
arXiv Detail & Related papers (2021-06-04T08:19:44Z)
Face2Face: Real-time Face Capture and Reenactment of RGB Videos [66.38142459175191]
Face2Face is a novel approach for real-time facial reenactment of a monocular target video sequence. We track facial expressions of both source and target video using a dense photometric consistency measure. We convincingly re-render the synthesized target face on top of the corresponding video stream.
arXiv Detail & Related papers (2020-07-29T12:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.