Sample Efficient Detection and Classification of Adversarial Attacks via
Self-Supervised Embeddings
- URL: http://arxiv.org/abs/2108.13797v1
- Date: Mon, 30 Aug 2021 16:39:52 GMT
- Title: Sample Efficient Detection and Classification of Adversarial Attacks via
Self-Supervised Embeddings
- Authors: Mazda Moayeri and Soheil Feizi
- Abstract summary: Adrial robustness of deep models is pivotal in ensuring safe deployment in real world settings.
We propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models.
We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility.
- Score: 40.332149464256496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial robustness of deep models is pivotal in ensuring safe deployment
in real world settings, but most modern defenses have narrow scope and
expensive costs. In this paper, we propose a self-supervised method to detect
adversarial attacks and classify them to their respective threat models, based
on a linear model operating on the embeddings from a pre-trained
self-supervised encoder. We use a SimCLR encoder in our experiments, since we
show the SimCLR embedding distance is a good proxy for human perceptibility,
enabling it to encapsulate many threat models at once. We call our method
SimCat since it uses SimCLR encoder to catch and categorize various types of
adversarial attacks, including L_p and non-L_p evasion attacks, as well as data
poisonings. The simple nature of a linear classifier makes our method efficient
in both time and sample complexity. For example, on SVHN, using only five pairs
of clean and adversarial examples computed with a PGD-L_inf attack, SimCat's
detection accuracy is over 85%. Moreover, on ImageNet, using only 25 examples
from each threat model, SimCat can classify eight different attack types such
as PGD-L_2, PGD-L_inf, CW-L_2, PPGD, LPA, StAdv, ReColor, and JPEG-L_inf, with
over 40% accuracy. On STL10 data, we apply SimCat as a defense against
poisoning attacks, such as BP, CP, FC, CLBD, HTBD, halving the success rate
while using only twenty total poisons for training. We find that the detectors
generalize well to unseen threat models. Lastly, we investigate the performance
of our detection method under adaptive attacks and further boost its robustness
against such attacks via adversarial training.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - AntidoteRT: Run-time Detection and Correction of Poison Attacks on
Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks.
We propose lightweight automated detection and correction techniques against poisoning attacks.
Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z) - Post-Training Detection of Backdoor Attacks for Two-Class and
Multi-Attack Scenarios [22.22337220509128]
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers.
We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
arXiv Detail & Related papers (2022-01-20T22:21:38Z) - Classification Auto-Encoder based Detector against Diverse Data
Poisoning Attacks [7.150136251781658]
Poisoning attacks are a category of adversarial machine learning threats.
In this paper, we propose CAE, a Classification Auto-Encoder based detector against poisoned data.
We show that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model.
arXiv Detail & Related papers (2021-08-09T17:46:52Z) - Self-Supervised Adversarial Example Detection by Disentangled
Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples.
This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder.
Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z) - Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models.
In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms.
CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Adversarial Detection and Correction by Matching Prediction
Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST.
We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.