Related papers: Evading Deepfake-Image Detectors with White- and Black-Box Attacks

Evading Deepfake-Image Detectors with White- and Black-Box Attacks

URL: http://arxiv.org/abs/2004.00622v1
Date: Wed, 1 Apr 2020 17:59:59 GMT
Title: Evading Deepfake-Image Detectors with White- and Black-Box Attacks
Authors: Nicholas Carlini, Hany Farid
Abstract summary: We show that a popular forensic approach trains a neural network to distinguish real from synthetic content. We develop five attack case studies on a state-of-the-art classifier that achieves an area under the ROC curve (AUC) of 0.95 on almost all existing image generators. We also develop a black-box attack that, with no access to the target classifier, reduces the AUC to 0.22.
Score: 75.13740810603686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is now possible to synthesize highly realistic images of people who don't exist. Such content has, for example, been implicated in the creation of fraudulent social-media profiles responsible for dis-information campaigns. Significant efforts are, therefore, being deployed to detect synthetically-generated content. One popular forensic approach trains a neural network to distinguish real from synthetic content. We show that such forensic classifiers are vulnerable to a range of attacks that reduce the classifier to near-0% accuracy. We develop five attack case studies on a state-of-the-art classifier that achieves an area under the ROC curve (AUC) of 0.95 on almost all existing image generators, when only trained on one generator. With full access to the classifier, we can flip the lowest bit of each pixel in an image to reduce the classifier's AUC to 0.0005; perturb 1% of the image area to reduce the classifier's AUC to 0.08; or add a single noise pattern in the synthesizer's latent space to reduce the classifier's AUC to 0.17. We also develop a black-box attack that, with no access to the target classifier, reduces the AUC to 0.22. These attacks reveal significant vulnerabilities of certain image-forensic classifiers.

Related papers

CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI [58.35348718345307]
Current efforts to distinguish between real and AI-generated images may lack generalization.<n>We propose a novel framework, Co-Spy, that first enhances existing semantic features.<n>We also create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models.
arXiv Detail & Related papers (2025-03-24T01:59:29Z)
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification [65.46685389276443]
We ground our work on CLIP, a vision-language pre-trained encoder model that can perform zero-shot classification by matching an image with text prompts. We then formulate purification risk as the KL divergence between the joint distributions purification process. We propose two variants for our CLIPure approach: CLI-Diff which models the likelihood of images' latent vectors, and CLIPure-Cos which models the likelihood with the cosine similarity between the embeddings of an image and a photo of a.''
arXiv Detail & Related papers (2025-02-25T13:09:34Z)
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class. Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z)
ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks. We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z)
Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples. We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model. Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z)
Rethinking Image Forgery Detection via Contrastive Learning and Unsupervised Clustering [26.923409536155166]
We propose FOrensic ContrAstive cLustering (FOCAL) method for image forgery detection. FOCAL is based on contrastive learning and unsupervised clustering. Results show FOCAL significantly outperforms state-of-the-art competing algorithms.
arXiv Detail & Related papers (2023-08-18T05:05:30Z)
Influencer Backdoor Attack on Semantic Segmentation [39.57965442338681]
Influencer Backdoor Attack (IBA) is a backdoor attack on semantic segmentation models. IBA is expected to maintain the classification accuracy of non-victim pixels and mislead classifications of all victim pixels in every single inference. We introduce an innovative Pixel Random Labeling strategy which maintains optimal performance even when the trigger is placed far from the victim pixels.
arXiv Detail & Related papers (2023-03-21T17:45:38Z)
SAIF: Sparse Adversarial and Imperceptible Attack Framework [7.025774823899217]
We propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF) Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
arXiv Detail & Related papers (2022-12-14T20:28:50Z)
Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks. We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP) On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z)
Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels. Recent efforts combine it with another l_infty perturbation on magnitudes. We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z)
Non-Intrusive Detection of Adversarial Deep Learning Attacks via Observer Networks [5.4572790062292125]
Recent studies have shown that deep learning models are vulnerable to crafted adversarial inputs. We propose a novel method to detect adversarial inputs by augmenting the main classification network with multiple binary detectors. We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-02-22T21:13:00Z)
Adversarial Attacks on Convolutional Neural Networks in Facial Recognition Domain [2.4704085162861693]
Adversarial attacks that render Deep Neural Network (DNN) classifiers vulnerable in real life represent a serious threat in autonomous vehicles, malware filters, or biometric authentication systems. We apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier. We craft a variety of different black-box attack algorithms on a facial image dataset assuming minimal adversarial knowledge.
arXiv Detail & Related papers (2020-01-30T00:25:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.