Related papers: Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

URL: http://arxiv.org/abs/2510.03548v2
Date: Fri, 24 Oct 2025 18:41:25 GMT
Title: Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing
Authors: Danial Samadi Vahdati, Tai Duc Nguyen, Ekta Prashnani, Koki Nagano, David Luebke, Orazio Gallo, Matthew Stamm,
Abstract summary: AI-based talking-head videoconferencing systems reduce bandwidth by sending a compact pose-expression latent and re-synthesizing RGB at the receiver.<n>This latent can be puppeteered, letting an attacker hijack a victim's likeness in real time.<n>We introduce the first biometric leakage defense without ever looking at the reconstructed RGB video.
Score: 13.359027851861677
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: AI-based talking-head videoconferencing systems reduce bandwidth by sending a compact pose-expression latent and re-synthesizing RGB at the receiver, but this latent can be puppeteered, letting an attacker hijack a victim's likeness in real time. Because every frame is synthetic, deepfake and synthetic video detectors fail outright. To address this security problem, we exploit a key observation: the pose-expression latent inherently contains biometric information of the driving identity. Therefore, we introduce the first biometric leakage defense without ever looking at the reconstructed RGB video: a pose-conditioned, large-margin contrastive encoder that isolates persistent identity cues inside the transmitted latent while cancelling transient pose and expression. A simple cosine test on this disentangled embedding flags illicit identity swaps as the video is rendered. Our experiments on multiple talking-head generation models show that our method consistently outperforms existing puppeteering defenses, operates in real-time, and shows strong generalization to out-of-distribution scenarios.

Related papers

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy [53.75084833636302]
We propose SIDeR, a Semantic decoupling-driven framework for unrestricted face privacy protection.<n> SIDeR decomposes a facial image into a machine-recognizable identity feature vector and a visually perceptible semantic appearance component.<n>For authorized access, SIDeR can be restored to its original form when the correct password is provided.
arXiv Detail & Related papers (2026-02-04T19:30:48Z)
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing [66.48064661467781]
We introduce sparse-frame video dubbing, a novel paradigm that strategically preserves references to maintain identity, iconic gestures, and camera trajectories.<n>We propose InfiniteTalk, a streaming audio-driven generator designed for infinite-length long sequence dubbing.<n> Comprehensive evaluations on HDTF, CelebV-HQ, and EMTD datasets demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2025-08-19T17:55:23Z)
Proactive Disentangled Modeling of Trigger-Object Pairings for Backdoor Defense [0.0]
Deep neural networks (DNNs) and generative AI (GenAI) are increasingly vulnerable to backdoor attacks.<n>In this paper, we introduce DBOM, a proactive framework that leverages structured disentanglement to identify and neutralize both seen and unseen backdoor threats.<n>We show that DBOM robustly detects poisoned images prior to downstream training, significantly enhancing the security of training pipelines.
arXiv Detail & Related papers (2025-08-03T21:58:15Z)
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning [58.70446237944036]
DAVID-X is the first dataset to pair AI-generated videos with detailed defect-level, temporal-spatial annotations and written rationales.<n>We present DAVID-XR1, a video-language model designed to deliver an interpretable chain of visual reasoning.<n>Our results highlight the promise of explainable detection methods for trustworthy identification of AI-generated video content.
arXiv Detail & Related papers (2025-06-13T13:39:53Z)
Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification [24.962600785183582]
Visible-infrared pedestrian Re-identification (VI-ReID) aims to match pedestrian images captured by infrared cameras and visible cameras. This paper introduces the first physical adversarial attack against VI-ReID models.
arXiv Detail & Related papers (2024-10-26T06:40:10Z)
Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks [39.524974831780874]
We show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks.<n>We name our method as Frequency-based Post-train Bayesian Attack, or FPBA.
arXiv Detail & Related papers (2024-07-30T14:07:17Z)
Let the Noise Speak: Harnessing Noise for a Unified Defense Against Adversarial and Backdoor Attacks [31.291700348439175]
Malicious data manipulation attacks against machine learning jeopardize its reliability in safety-critical applications.<n>We propose NoiSec, a reconstruction-based intrusion detection system.<n>NoiSec disentangles the noise from the test input, extracts the underlying features from the noise, and leverages them to recognize systematic malicious manipulation.
arXiv Detail & Related papers (2024-06-18T21:44:51Z)
Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification [24.9205771457704]
The paper proposes a new visible-infrared video person re-ID method from a novel perspective, i.e., adversarial self-attack defense and spatial-temporal relation mining. The proposed method exhibits compelling performance on large-scale cross-modality video datasets.
arXiv Detail & Related papers (2023-07-08T05:03:10Z)
Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community. Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data. We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z)
Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth Uncertainty Learning [54.15303628138665]
Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks. Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance. We propose Dual Spoof Disentanglement Generation framework to tackle this challenge by "anti-spoofing via generation"
arXiv Detail & Related papers (2021-12-01T15:36:59Z)
Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes. We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection. We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z)
Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation. We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation. The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.