Adversarial Detection without Model Information
- URL: http://arxiv.org/abs/2202.04271v1
- Date: Wed, 9 Feb 2022 04:38:16 GMT
- Title: Adversarial Detection without Model Information
- Authors: Abhishek Moitra, Youngeun Kim, and Priyadarshini Panda
- Abstract summary: We propose a model independent adversarial detection method using a simple energy function.
We train a standalone detector independent of the underlying model, with sequential layer-wise training to increase the energy separation corresponding to natural and adversarial inputs.
Our method achieves state-of-the-art detection performance (ROC-AUC > 0.9) across a wide range of gradient, score and decision-based adversarial attacks.
- Score: 4.4890837610757695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most prior state-of-the-art adversarial detection works assume that the
underlying vulnerable model is accessible, i,e., the model can be trained or
its outputs are visible. However, this is not a practical assumption due to
factors like model encryption, model information leakage and so on. In this
work, we propose a model independent adversarial detection method using a
simple energy function to distinguish between adversarial and natural inputs.
We train a standalone detector independent of the underlying model, with
sequential layer-wise training to increase the energy separation corresponding
to natural and adversarial inputs. With this, we perform energy
distribution-based adversarial detection. Our method achieves state-of-the-art
detection performance (ROC-AUC > 0.9) across a wide range of gradient, score
and decision-based adversarial attacks on CIFAR10, CIFAR100 and TinyImagenet
datasets. Compared to prior approaches, our method requires ~10-100x less
number of operations and parameters for adversarial detection. Further, we show
that our detection method is transferable across different datasets and
adversarial attacks. For reproducibility, we provide code in the supplementary
material.
Related papers
- On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data [6.7236795813629]
We propose a novel detection algorithm for detecting unknown objects in image data.
It exploits supervised dimensionality reduction techniques to mitigate the effects of the curse of dimensionality on the features extracted by the model.
It utilizes high-resolution feature maps to identify potential unknown objects in an unsupervised fashion.
arXiv Detail & Related papers (2024-11-07T10:15:25Z) - DSDE: Using Proportion Estimation to Improve Model Selection for Out-of-Distribution Detection [15.238164468992148]
Experimental results on CIFAR10 and CIFAR100 demonstrate the effectiveness of our approach in tackling OoD detection challenges.
We name the proposed approach as DOS-Storey-based Detector Ensemble (DSDE)
arXiv Detail & Related papers (2024-11-03T09:01:36Z) - Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Detecting Adversarial Data via Perturbation Forgery [28.637963515748456]
adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data.
New attacks based on generative models with imbalanced and anisotropic noise patterns evade detection.
We propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting unseen gradient-based, generative-model-based, and physical adversarial attacks.
arXiv Detail & Related papers (2024-05-25T13:34:16Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis [2.5347892611213614]
Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions.
We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples.
Our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.
arXiv Detail & Related papers (2024-04-12T21:22:21Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Adversarial Detection and Correction by Matching Prediction
Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST.
We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.