Related papers: Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings

Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings

URL: http://arxiv.org/abs/2108.13797v1
Date: Mon, 30 Aug 2021 16:39:52 GMT
Title: Sample Efficient Detection and Classification of Adversarial Attacks via Self-Supervised Embeddings
Authors: Mazda Moayeri and Soheil Feizi
Abstract summary: Adrial robustness of deep models is pivotal in ensuring safe deployment in real world settings. We propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models. We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility.
Score: 40.332149464256496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial robustness of deep models is pivotal in ensuring safe deployment in real world settings, but most modern defenses have narrow scope and expensive costs. In this paper, we propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models, based on a linear model operating on the embeddings from a pre-trained self-supervised encoder. We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility, enabling it to encapsulate many threat models at once. We call our method SimCat since it uses SimCLR encoder to catch and categorize various types of adversarial attacks, including L_p and non-L_p evasion attacks, as well as data poisonings. The simple nature of a linear classifier makes our method efficient in both time and sample complexity. For example, on SVHN, using only five pairs of clean and adversarial examples computed with a PGD-L_inf attack, SimCat's detection accuracy is over 85%. Moreover, on ImageNet, using only 25 examples from each threat model, SimCat can classify eight different attack types such as PGD-L_2, PGD-L_inf, CW-L_2, PPGD, LPA, StAdv, ReColor, and JPEG-L_inf, with over 40% accuracy. On STL10 data, we apply SimCat as a defense against poisoning attacks, such as BP, CP, FC, CLBD, HTBD, halving the success rate while using only twenty total poisons for training. We find that the detectors generalize well to unseen threat models. Lastly, we investigate the performance of our detection method under adaptive attacks and further boost its robustness against such attacks via adversarial training.

Related papers

AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)
On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications. We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency. Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z)
AntidoteRT: Run-time Detection and Correction of Poison Attacks on Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks. We propose lightweight automated detection and correction techniques against poisoning attacks. Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z)
Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios [22.22337220509128]
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
arXiv Detail & Related papers (2022-01-20T22:21:38Z)
Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks [7.150136251781658]
Poisoning attacks are a category of adversarial machine learning threats. In this paper, we propose CAE, a Classification Auto-Encoder based detector against poisoned data. We show that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model.
arXiv Detail & Related papers (2021-08-09T17:46:52Z)
Self-Supervised Adversarial Example Detection by Disentangled Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples. This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder. Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z)
Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models. In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms. CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Adversarial Detection and Correction by Matching Prediction Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST. We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.