Related papers: Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

URL: http://arxiv.org/abs/2404.08341v1
Date: Fri, 12 Apr 2024 09:13:37 GMT
Title: Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
Authors: Yang Li, Songlin Yang, Wei Wang, Ziwen He, Bo Peng, Jing Dong,
Abstract summary: Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns. We provide counterfactual explanations for face forgery detection from an artifact removal perspective. Our method achieves over 90% attack success rate and superior attack transferability.
Score: 23.279652897139286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns. Although DNN-based face forgery detection models have achieved good performance, they are vulnerable to latest generative methods that have less forgery traces and adversarial attacks. This limitation of generalization and robustness hinders the credibility of detection results and requires more explanations. In this work, we provide counterfactual explanations for face forgery detection from an artifact removal perspective. Specifically, we first invert the forgery images into the StyleGAN latent space, and then adversarially optimize their latent representations with the discrimination supervision from the target detection model. We verify the effectiveness of the proposed explanations from two aspects: (1) Counterfactual Trace Visualization: the enhanced forgery images are useful to reveal artifacts by visually contrasting the original images and two different visualization methods; (2) Transferable Adversarial Attacks: the adversarial forgery images generated by attacking the detection model are able to mislead other detection models, implying the removed artifacts are general. Extensive experiments demonstrate that our method achieves over 90% attack success rate and superior attack transferability. Compared with naive adversarial noise methods, our method adopts both generative and discriminative model priors, and optimize the latent representations in a synthesis-by-analysis way, which forces the search of counterfactual explanations on the natural face manifold. Thus, more general counterfactual traces can be found and better adversarial attack transferability can be achieved.

Related papers

ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors [58.45131932883374]
We propose a fully self-supervised approach to detect deepfakes in videos.<n>Our model computes the identity distances between suspected videos and personalized subjects via diffusion reconstruction errors.<n>Our method is highly robust to corruptions such as blur and compression, highlighting the applicability in real-world face forgery detection.
arXiv Detail & Related papers (2026-01-05T18:59:54Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models. In this paper, we investigate how detection performance varies across model backbones, types, and datasets. We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples. It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z)
UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization. We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z)
Imperceptible Face Forgery Attack via Adversarial Semantic Mask [59.23247545399068]
We propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility. Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness.
arXiv Detail & Related papers (2024-06-16T10:38:11Z)
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space. Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings. The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z)
Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces [6.105361899083232]
We show that it is possible to successfully generate adversarial fake faces with a specified set of attributes. We propose a framework to search for adversarial latent codes within the feature space of StyleGAN. We also propose a meta-learning based optimization strategy to achieve transferable performance on unknown target models.
arXiv Detail & Related papers (2023-06-22T17:59:55Z)
Deepfake Forensics via An Adversarial Game [99.84099103679816]
We advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted by models yet difficult to generalize, we propose a new adversarial training method that attempts to blur out these specific artifacts.
arXiv Detail & Related papers (2021-03-25T02:20:08Z)
Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence. We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z)
Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision. Existing defenses are trend to harden the robustness of models against adversarial attacks. We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.