How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples
- URL: http://arxiv.org/abs/2404.12653v1
- Date: Fri, 19 Apr 2024 06:42:01 GMT
- Title: How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples
- Authors: Dren Fazlija, Arkadij Orlov, Johanna Schrader, Monty-Maximilian Zühlke, Michael Rohs, Daniel Kudenko,
- Abstract summary: adversarial examples threaten the safety of AI-based systems such as autonomous vehicles.
In the image domain, they represent maliciously perturbed data points that look benign to humans.
We propose SCOOTER - an evaluation framework for unrestricted image-based attacks.
- Score: 8.483679748399036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With an ever-increasing reliance on machine learning (ML) models in the real world, adversarial examples threaten the safety of AI-based systems such as autonomous vehicles. In the image domain, they represent maliciously perturbed data points that look benign to humans (i.e., the image modification is not noticeable) but greatly mislead state-of-the-art ML models. Previously, researchers ensured the imperceptibility of their altered data points by restricting perturbations via $\ell_p$ norms. However, recent publications claim that creating natural-looking adversarial examples without such restrictions is also possible. With much more freedom to instill malicious information into data, these unrestricted adversarial examples can potentially overcome traditional defense strategies as they are not constrained by the limitations or patterns these defenses typically recognize and mitigate. This allows attackers to operate outside of expected threat models. However, surveying existing image-based methods, we noticed a need for more human evaluations of the proposed image modifications. Based on existing human-assessment frameworks for image generation quality, we propose SCOOTER - an evaluation framework for unrestricted image-based attacks. It provides researchers with guidelines for conducting statistically significant human experiments, standardized questions, and a ready-to-use implementation. We propose a framework that allows researchers to analyze how imperceptible their unrestricted attacks truly are.
Related papers
- Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors [14.284639462471274]
We evaluate state-of-the-art AI-generated image (AIGI) detectors under different attack scenarios.
Attacks can significantly reduce detection accuracy to the extent that the risks of relying on detectors outweigh their benefits.
We propose a simple defense mechanism to make CLIP-based detectors, which are currently the best-performing detectors, robust against these attacks.
arXiv Detail & Related papers (2024-10-02T14:11:29Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Privacy Assessment on Reconstructed Images: Are Existing Evaluation
Metrics Faithful to Human Perception? [86.58989831070426]
We study the faithfulness of hand-crafted metrics to human perception of privacy information from reconstructed images.
We propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images.
arXiv Detail & Related papers (2023-09-22T17:58:04Z) - Protect Federated Learning Against Backdoor Attacks via Data-Free
Trigger Generation [25.072791779134]
Federated Learning (FL) enables large-scale clients to collaboratively train a model without sharing their raw data.
Due to the lack of data auditing for untrusted clients, FL is vulnerable to poisoning attacks, especially backdoor attacks.
We propose a novel data-free trigger-generation-based defense approach based on the two characteristics of backdoor attacks.
arXiv Detail & Related papers (2023-08-22T10:16:12Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of
Adversarial Robustness of Vision Models [61.68061613161187]
This paper presents CARLA-GeAR, a tool for the automatic generation of synthetic datasets for evaluating the robustness of neural models against physical adversarial patches.
The tool is built on the CARLA simulator, using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving.
The paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GeAR might be used in future work as a benchmark for adversarial defense in the real world.
arXiv Detail & Related papers (2022-06-09T09:17:38Z) - On the Robustness of Quality Measures for GANs [136.18799984346248]
This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr'echet Inception Distance (FID)
We show that such metrics can also be manipulated by additive pixel perturbations.
arXiv Detail & Related papers (2022-01-31T06:43:09Z) - Invertible Image Dataset Protection [23.688878249633508]
We develop a reversible adversarial example generator (RAEG) that introduces slight changes to the images to fool traditional classification models.
RAEG can better protect the data with slight distortion against adversarial defense than previous methods.
arXiv Detail & Related papers (2021-12-29T06:56:43Z) - Generating Unrestricted Adversarial Examples via Three Parameters [11.325135016306165]
A proposed adversarial attack generates an unrestricted adversarial example with a limited number of parameters.
It obtains an average success rate of 93.5% in terms of human evaluation on the MNIST and SVHN datasets.
It also reduces the model accuracy by an average of 73% on six datasets.
arXiv Detail & Related papers (2021-03-13T07:20:14Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - Practical Fast Gradient Sign Attack against Mammographic Image
Classifier [0.0]
The motivation behind this paper is that we emphasize this issue and want to raise awareness.
We use mamographic images to train our model then evaluate our model performance in terms of accuracy.
We then using structural similarity index (SSIM) analyze similarity between clean images and adversarial images.
arXiv Detail & Related papers (2020-01-27T07:37:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.