Evading Deepfake-Image Detectors with White- and Black-Box Attacks
- URL: http://arxiv.org/abs/2004.00622v1
- Date: Wed, 1 Apr 2020 17:59:59 GMT
- Title: Evading Deepfake-Image Detectors with White- and Black-Box Attacks
- Authors: Nicholas Carlini, Hany Farid
- Abstract summary: We show that a popular forensic approach trains a neural network to distinguish real from synthetic content.
We develop five attack case studies on a state-of-the-art classifier that achieves an area under the ROC curve (AUC) of 0.95 on almost all existing image generators.
We also develop a black-box attack that, with no access to the target classifier, reduces the AUC to 0.22.
- Score: 75.13740810603686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is now possible to synthesize highly realistic images of people who don't
exist. Such content has, for example, been implicated in the creation of
fraudulent social-media profiles responsible for dis-information campaigns.
Significant efforts are, therefore, being deployed to detect
synthetically-generated content. One popular forensic approach trains a neural
network to distinguish real from synthetic content.
We show that such forensic classifiers are vulnerable to a range of attacks
that reduce the classifier to near-0% accuracy. We develop five attack case
studies on a state-of-the-art classifier that achieves an area under the ROC
curve (AUC) of 0.95 on almost all existing image generators, when only trained
on one generator. With full access to the classifier, we can flip the lowest
bit of each pixel in an image to reduce the classifier's AUC to 0.0005; perturb
1% of the image area to reduce the classifier's AUC to 0.08; or add a single
noise pattern in the synthesizer's latent space to reduce the classifier's AUC
to 0.17. We also develop a black-box attack that, with no access to the target
classifier, reduces the AUC to 0.22. These attacks reveal significant
vulnerabilities of certain image-forensic classifiers.
Related papers
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class.
Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z) - ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks.
We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z) - Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples.
We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model.
Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z) - Rethinking Image Forgery Detection via Contrastive Learning and
Unsupervised Clustering [26.923409536155166]
We propose FOrensic ContrAstive cLustering (FOCAL) method for image forgery detection.
FOCAL is based on contrastive learning and unsupervised clustering.
Results show FOCAL significantly outperforms state-of-the-art competing algorithms.
arXiv Detail & Related papers (2023-08-18T05:05:30Z) - Influencer Backdoor Attack on Semantic Segmentation [39.57965442338681]
Influencer Backdoor Attack (IBA) is a backdoor attack on semantic segmentation models.
IBA is expected to maintain the classification accuracy of non-victim pixels and mislead classifications of all victim pixels in every single inference.
We introduce an innovative Pixel Random Labeling strategy which maintains optimal performance even when the trigger is placed far from the victim pixels.
arXiv Detail & Related papers (2023-03-21T17:45:38Z) - SAIF: Sparse Adversarial and Imperceptible Attack Framework [7.025774823899217]
We propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF)
Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers.
SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
arXiv Detail & Related papers (2022-12-14T20:28:50Z) - Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks.
We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP)
On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z) - Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels.
Recent efforts combine it with another l_infty perturbation on magnitudes.
We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z) - Non-Intrusive Detection of Adversarial Deep Learning Attacks via
Observer Networks [5.4572790062292125]
Recent studies have shown that deep learning models are vulnerable to crafted adversarial inputs.
We propose a novel method to detect adversarial inputs by augmenting the main classification network with multiple binary detectors.
We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-02-22T21:13:00Z) - Adversarial Attacks on Convolutional Neural Networks in Facial
Recognition Domain [2.4704085162861693]
Adversarial attacks that render Deep Neural Network (DNN) classifiers vulnerable in real life represent a serious threat in autonomous vehicles, malware filters, or biometric authentication systems.
We apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier.
We craft a variety of different black-box attack algorithms on a facial image dataset assuming minimal adversarial knowledge.
arXiv Detail & Related papers (2020-01-30T00:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.