Related papers: Neural Fingerprints for Adversarial Attack Detection

Neural Fingerprints for Adversarial Attack Detection

URL: http://arxiv.org/abs/2411.04533v1
Date: Thu, 07 Nov 2024 08:43:42 GMT
Title: Neural Fingerprints for Adversarial Attack Detection
Authors: Haim Fisher, Moni Shahar, Yehezkel S. Resheff,
Abstract summary: A well known vulnerability of deep learning models is their susceptibility to adversarial examples. Many algorithms have been proposed to address this problem, falling generally into one of two categories. We argue that in a white-box setting, where the attacker knows the configuration and weights of the network and the detector, they can overcome the detector. This problem is common in security applications where even a very good model is not sufficient to ensure safety.
Score: 2.7309692684728613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning models for image classification have become standard tools in recent years. A well known vulnerability of these models is their susceptibility to adversarial examples. These are generated by slightly altering an image of a certain class in a way that is imperceptible to humans but causes the model to classify it wrongly as another class. Many algorithms have been proposed to address this problem, falling generally into one of two categories: (i) building robust classifiers (ii) directly detecting attacked images. Despite the good performance of these detectors, we argue that in a white-box setting, where the attacker knows the configuration and weights of the network and the detector, they can overcome the detector by running many examples on a local copy, and sending only those that were not detected to the actual model. This problem is common in security applications where even a very good model is not sufficient to ensure safety. In this paper we propose to overcome this inherent limitation of any static defence with randomization. To do so, one must generate a very large family of detectors with consistent performance, and select one or more of them randomly for each input. For the individual detectors, we suggest the method of neural fingerprints. In the training phase, for each class we repeatedly sample a tiny random subset of neurons from certain layers of the network, and if their average is sufficiently different between clean and attacked images of the focal class they are considered a fingerprint and added to the detector bank. During test time, we sample fingerprints from the bank associated with the label predicted by the model, and detect attacks using a likelihood ratio test. We evaluate our detectors on ImageNet with different attack methods and model architectures, and show near-perfect detection with low rates of false detection.

Related papers

Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors [1.3493547928462395]
We propose a novel method to fool object detectors, expose the vulnerability of state-of-the-art detectors, and promote later works to build more robust detectors to adversarial examples. Our method aims to generate adversarial images by perturbing object confidence scores during training, which is crucial in predicting confidence for each class in the testing phase. To verify the proposed method, we perform adversarial attacks against different object detectors, including the most recent state-of-the-art models like YOLOv8, Faster R-CNN, RetinaNet, and Swin Transformer.
arXiv Detail & Related papers (2024-12-25T07:51:57Z)
HOLMES: to Detect Adversarial Examples with Multiple Detectors [1.455585466338228]
HOLMES is able to distinguish textitunseen adversarial examples from multiple attacks with high accuracy and low false positive rates. Our effective and inexpensive strategies neither modify original DNN models nor require its internal parameters.
arXiv Detail & Related papers (2024-05-30T11:22:55Z)
Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint. We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC) SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z)
Btech thesis report on adversarial attack detection and purification of adverserially attacked images [0.0]
This thesis report is on detection and purification of adverserially attacked images. A deep learning model is trained on certain training examples for various tasks such as classification, regression etc.
arXiv Detail & Related papers (2022-05-09T09:24:11Z)
Detecting Adversaries, yet Faltering to Noise? Leveraging Conditional Variational AutoEncoders for Adversary Detection in the Presence of Noisy Images [0.7734726150561086]
Conditional Variational AutoEncoders (CVAE) are surprisingly good at detecting imperceptible image perturbations. We show how CVAEs can be effectively used to detect adversarial attacks on image classification networks.
arXiv Detail & Related papers (2021-11-28T20:36:27Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples. We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z)
Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z)
ExAD: An Ensemble Approach for Explanation-based Adversarial Detection [17.455233006559734]
We propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques. We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets.
arXiv Detail & Related papers (2021-03-22T00:53:07Z)
Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection. In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection. The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
Detection as Regression: Certified Object Detection by Median Smoothing [50.89591634725045]
This work is motivated by recent progress on certified classification by randomized smoothing. We obtain the first model-agnostic, training-free, and certified defense for object detection against $ell$-bounded attacks.
arXiv Detail & Related papers (2020-07-07T18:40:19Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.