Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors
- URL: http://arxiv.org/abs/2410.01574v2
- Date: Thu, 3 Oct 2024 10:11:53 GMT
- Title: Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors
- Authors: Sina Mavali, Jonas Ricker, David Pape, Yash Sharma, Asja Fischer, Lea Schönherr,
- Abstract summary: We evaluate state-of-the-art AI-generated image (AIGI) detectors under different attack scenarios.
Attacks can significantly reduce detection accuracy to the extent that the risks of relying on detectors outweigh their benefits.
We propose a simple defense mechanism to make CLIP-based detectors, which are currently the best-performing detectors, robust against these attacks.
- Score: 14.284639462471274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While generative AI (GenAI) offers countless possibilities for creative and productive tasks, artificially generated media can be misused for fraud, manipulation, scams, misinformation campaigns, and more. To mitigate the risks associated with maliciously generated media, forensic classifiers are employed to identify AI-generated content. However, current forensic classifiers are often not evaluated in practically relevant scenarios, such as the presence of an attacker or when real-world artifacts like social media degradations affect images. In this paper, we evaluate state-of-the-art AI-generated image (AIGI) detectors under different attack scenarios. We demonstrate that forensic classifiers can be effectively attacked in realistic settings, even when the attacker does not have access to the target model and post-processing occurs after the adversarial examples are created, which is standard on social media platforms. These attacks can significantly reduce detection accuracy to the extent that the risks of relying on detectors outweigh their benefits. Finally, we propose a simple defense mechanism to make CLIP-based detectors, which are currently the best-performing detectors, robust against these attacks.
Related papers
- Time-Aware Face Anti-Spoofing with Rotation Invariant Local Binary Patterns and Deep Learning [50.79277723970418]
imitation attacks can lead to erroneous identification and subsequent authentication of attackers.
Similar to face recognition, imitation attacks can also be detected with Machine Learning.
We propose a novel approach that promises high classification accuracy by combining previously unused features with time-aware deep learning strategies.
arXiv Detail & Related papers (2024-08-27T07:26:10Z) - Imperceptible Face Forgery Attack via Adversarial Semantic Mask [59.23247545399068]
We propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility.
Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness.
arXiv Detail & Related papers (2024-06-16T10:38:11Z) - Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack [24.954755569786396]
We propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection.
We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness.
The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content.
arXiv Detail & Related papers (2024-04-02T12:49:22Z) - Real is not True: Backdoor Attacks Against Deepfake Detection [9.572726483706846]
We introduce a pioneering paradigm denominated as Bad-Deepfake, which represents a novel foray into the realm of backdoor attacks levied against deepfake detectors.
Our approach hinges upon the strategic manipulation of a subset of the training data, enabling us to wield disproportionate influence over the operational characteristics of a trained model.
arXiv Detail & Related papers (2024-03-11T10:57:14Z) - XAI-Based Detection of Adversarial Attacks on Deepfake Detectors [0.0]
We introduce a novel methodology for identifying adversarial attacks on deepfake detectors using XAI.
Our approach contributes not only to the detection of deepfakes but also enhances the understanding of possible adversarial attacks.
arXiv Detail & Related papers (2024-03-05T13:25:30Z) - Evading Forensic Classifiers with Attribute-Conditioned Adversarial
Faces [6.105361899083232]
We show that it is possible to successfully generate adversarial fake faces with a specified set of attributes.
We propose a framework to search for adversarial latent codes within the feature space of StyleGAN.
We also propose a meta-learning based optimization strategy to achieve transferable performance on unknown target models.
arXiv Detail & Related papers (2023-06-22T17:59:55Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Real-World Adversarial Examples involving Makeup Application [58.731070632586594]
We propose a physical adversarial attack with the use of full-face makeup.
Our attack can effectively overcome manual errors in makeup application, such as color and position-related errors.
arXiv Detail & Related papers (2021-09-04T05:29:28Z) - Adversarial Attacks for Tabular Data: Application to Fraud Detection and
Imbalanced Data [3.2458203725405976]
Adversarial attacks aim at producing adversarial examples, in other words, slightly modified inputs that induce the AI system to return incorrect outputs.
In this paper we illustrate a novel approach to modify and adapt state-of-the-art algorithms to imbalanced data, in the context of fraud detection.
Experimental results show that the proposed modifications lead to a perfect attack success rate.
When applied to a real-world production system, the proposed techniques shows the possibility of posing a serious threat to the robustness of advanced AI-based fraud detection procedures.
arXiv Detail & Related papers (2021-01-20T08:58:29Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.