Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted
- URL: http://arxiv.org/abs/2505.08255v1
- Date: Tue, 13 May 2025 06:09:34 GMT
- Title: Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted
- Authors: Shuaiwei Yuan, Junyu Dong, Yuezun Li,
- Abstract summary: Deepfake detectors are typically developed on Deep Neural Networks (DNNs) and trained using third-party datasets.<n>Third-party providers may distribute or sell these triggers to malicious users, allowing them to manipulate detector performance and escape accountability.<n>This paper investigates this risk in depth and describes a solution to stealthily infect Deepfake detectors.
- Score: 33.60389410624143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advancement of AI generative techniques, Deepfake faces have become incredibly realistic and nearly indistinguishable to the human eye. To counter this, Deepfake detectors have been developed as reliable tools for assessing face authenticity. These detectors are typically developed on Deep Neural Networks (DNNs) and trained using third-party datasets. However, this protocol raises a new security risk that can seriously undermine the trustfulness of Deepfake detectors: Once the third-party data providers insert poisoned (corrupted) data maliciously, Deepfake detectors trained on these datasets will be injected ``backdoors'' that cause abnormal behavior when presented with samples containing specific triggers. This is a practical concern, as third-party providers may distribute or sell these triggers to malicious users, allowing them to manipulate detector performance and escape accountability. This paper investigates this risk in depth and describes a solution to stealthily infect Deepfake detectors. Specifically, we develop a trigger generator, that can synthesize passcode-controlled, semantic-suppression, adaptive, and invisible trigger patterns, ensuring both the stealthiness and effectiveness of these triggers. Then we discuss two poisoning scenarios, dirty-label poisoning and clean-label poisoning, to accomplish the injection of backdoors. Extensive experiments demonstrate the effectiveness, stealthiness, and practicality of our method compared to several baselines.
Related papers
- Twin Trigger Generative Networks for Backdoor Attacks against Object Detection [14.578800906364414]
Object detectors, which are widely used in real-world applications, are vulnerable to backdoor attacks.
Most research on backdoor attacks has focused on image classification, with limited investigation into object detection.
We propose novel twin trigger generative networks to generate invisible triggers for implanting backdoors into models during training, and visible triggers for steady activation during inference.
arXiv Detail & Related papers (2024-11-23T03:46:45Z) - Real is not True: Backdoor Attacks Against Deepfake Detection [9.572726483706846]
We introduce a pioneering paradigm denominated as Bad-Deepfake, which represents a novel foray into the realm of backdoor attacks levied against deepfake detectors.
Our approach hinges upon the strategic manipulation of a subset of the training data, enabling us to wield disproportionate influence over the operational characteristics of a trained model.
arXiv Detail & Related papers (2024-03-11T10:57:14Z) - Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery
Detection [62.595450266262645]
This paper introduces a novel and previously unrecognized threat in face forgery detection scenarios caused by backdoor attack.
By embedding backdoors into models, attackers can deceive detectors into producing erroneous predictions for forged faces.
We propose emphPoisoned Forgery Face framework, which enables clean-label backdoor attacks on face forgery detectors.
arXiv Detail & Related papers (2024-02-18T06:31:05Z) - Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies [14.660707087391463]
Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy.
We propose a novel unsupervised method for detecting deepfake videos by directly identifying intra-modal and cross-modal inconsistency between video segments.
Our proposed method outperforms prior state-of-the-art unsupervised deepfake detection methods on the challenging FakeAVCeleb dataset.
arXiv Detail & Related papers (2023-11-28T03:28:19Z) - Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake
Detection [58.1263969438364]
We propose adversarial head turn (AdvHeat) as the first attempt at 3D adversarial face views against deepfake detectors.
Experiments validate the vulnerability of various detectors to AdvHeat in realistic, black-box scenarios.
Additional analyses demonstrate that AdvHeat is better than conventional attacks on both the cross-detector transferability and robustness to defenses.
arXiv Detail & Related papers (2023-09-03T07:01:34Z) - How Generalizable are Deepfake Image Detectors? An Empirical Study [4.42204674141385]
We present the first empirical study on the generalizability of deepfake detectors.
Our study utilizes six deepfake datasets, five deepfake image detection methods, and two model augmentation approaches.
We find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features.
arXiv Detail & Related papers (2023-08-08T10:30:34Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.<n>The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.<n>We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Making DeepFakes more spurious: evading deep face forgery detection via
trace removal attack [16.221725939480084]
We present a detector-agnostic trace removal attack for DeepFake anti-forensics.
Instead of investigating the detector side, our attack looks into the original DeepFake creation pipeline.
Experiments show that the proposed attack can significantly compromise the detection accuracy of six state-of-the-art DeepFake detectors.
arXiv Detail & Related papers (2022-03-22T03:13:33Z) - Understanding the Security of Deepfake Detection [23.118012417901078]
We study the security of state-of-the-art deepfake detection methods in adversarial settings.
We use two large-scale public deepfakes data sources including FaceForensics++ and Facebook Deepfake Detection Challenge.
Our results uncover multiple security limitations of the deepfake detection methods in adversarial settings.
arXiv Detail & Related papers (2021-07-05T14:18:21Z) - Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks
Trained from Scratch [99.90716010490625]
Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data.
This vulnerability is then activated at inference time by placing a "trigger" into the model's input.
We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process.
arXiv Detail & Related papers (2021-06-16T17:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.