Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations
- URL: http://arxiv.org/abs/2503.22121v2
- Date: Sun, 13 Apr 2025 18:17:29 GMT
- Title: Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations
- Authors: Tharun Anand, Siva Sankar Sajeev, Pravin Nair,
- Abstract summary: Deepfake techniques are increasingly narrowing the gap between real and synthetic videos, raising serious privacy and security concerns.<n>This work presents the first detection approach explicitly designed to generalize to localized edits in deepfake videos.<n>Our method achieves a $20%$ improvement in accuracy over current state-of-the-art detection methods.
- Score: 4.449835214520726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With rapid advancements in generative modeling, deepfake techniques are increasingly narrowing the gap between real and synthetic videos, raising serious privacy and security concerns. Beyond traditional face swapping and reenactment, an emerging trend in recent state-of-the-art deepfake generation methods involves localized edits such as subtle manipulations of specific facial features like raising eyebrows, altering eye shapes, or modifying mouth expressions. These fine-grained manipulations pose a significant challenge for existing detection models, which struggle to capture such localized variations. To the best of our knowledge, this work presents the first detection approach explicitly designed to generalize to localized edits in deepfake videos by leveraging spatiotemporal representations guided by facial action units. Our method leverages a cross-attention-based fusion of representations learned from pretext tasks like random masking and action unit detection, to create an embedding that effectively encodes subtle, localized changes. Comprehensive evaluations across multiple deepfake generation methods demonstrate that our approach, despite being trained solely on the traditional FF+ dataset, sets a new benchmark in detecting recent deepfake-generated videos with fine-grained local edits, achieving a $20\%$ improvement in accuracy over current state-of-the-art detection methods. Additionally, our method delivers competitive performance on standard datasets, highlighting its robustness and generalization across diverse types of local and global forgeries.
Related papers
- DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion [94.46904504076124]
Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content.
Existing methods often struggle to generalize to unseen domains due to the diverse nature of facial manipulations.
We introduce DiffusionFake, a novel framework that reverses the generative process of face forgeries to enhance the generalization of detection models.
arXiv Detail & Related papers (2024-10-06T06:22:43Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - The Tug-of-War Between Deepfake Generation and Detection [4.62070292702111]
Multimodal generative models are rapidly evolving, leading to a surge in the generation of realistic video and audio.
Deepfake videos, which can convincingly impersonate individuals, have particularly garnered attention due to their potential misuse.
This survey paper examines the dual landscape of deepfake video generation and detection, emphasizing the need for effective countermeasures.
arXiv Detail & Related papers (2024-07-08T17:49:41Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Undercover Deepfakes: Detecting Fake Segments in Videos [1.2609216345578933]
deepfake generation is a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth.
In this paper, we present a deepfake detection method that can address this issue by performing deepfake prediction at the frame and video levels.
In particular, the paradigm we address will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes.
arXiv Detail & Related papers (2023-05-11T04:43:10Z) - Cross-Domain Local Characteristic Enhanced Deepfake Video Detection [18.430287055542315]
Deepfake detection has attracted increasing attention due to security concerns.
Many detectors cannot achieve accurate results when detecting unseen manipulations.
We propose a novel pipeline, Cross-Domain Local Forensics, for more general deepfake video detection.
arXiv Detail & Related papers (2022-11-07T07:44:09Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Delving into Sequential Patches for Deepfake Detection [64.19468088546743]
Recent advances in face forgery techniques produce nearly untraceable deepfake videos, which could be leveraged with malicious intentions.
Previous studies has identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods.
We propose the Local- & Temporal-aware Transformer-based Deepfake Detection framework, which adopts a local-to-global learning protocol.
arXiv Detail & Related papers (2022-07-06T16:46:30Z) - Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery
Detection [118.37239586697139]
LipForensics is a detection approach capable of both generalising manipulations and withstanding various distortions.
It consists in first pretraining a-temporal network to perform visual speech recognition (lipreading)
A temporal network is subsequently finetuned on fixed mouth embeddings of real and forged data in order to detect fake videos based on mouth movements without over-fitting to low-level, manipulation-specific artefacts.
arXiv Detail & Related papers (2020-12-14T15:53:56Z) - FakeLocator: Robust Localization of GAN-Based Face Manipulations [19.233930372590226]
We propose a novel approach, termed FakeLocator, to obtain high localization accuracy, at full resolution, on manipulated facial images.
This is the very first attempt to solve the GAN-based fake localization problem with a gray-scale fakeness map.
Experimental results on popular FaceForensics++, DFFD datasets and seven different state-of-the-art GAN-based face generation methods have shown the effectiveness of our method.
arXiv Detail & Related papers (2020-01-27T06:15:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.