Towards Certifiable Adversarial Sample Detection
- URL: http://arxiv.org/abs/2002.08740v1
- Date: Thu, 20 Feb 2020 14:10:00 GMT
- Title: Towards Certifiable Adversarial Sample Detection
- Authors: Ilia Shumailov, Yiren Zhao, Robert Mullins, Ross Anderson
- Abstract summary: Convolutional Neural Networks (CNNs) are deployed in more and more classification systems.
adversarial samples can be maliciously crafted to trick them, and are becoming a real threat.
We provide a new approach in the form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap (CTT)
We show that CTT has small false positive rates on clean test data, minimal compute overheads when deployed, and can support complex security policies.
- Score: 9.29621938988162
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Convolutional Neural Networks (CNNs) are deployed in more and more
classification systems, but adversarial samples can be maliciously crafted to
trick them, and are becoming a real threat. There have been various proposals
to improve CNNs' adversarial robustness but these all suffer performance
penalties or other limitations. In this paper, we provide a new approach in the
form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap
(CTT). The system can provide certifiable guarantees of detection of
adversarial inputs for certain $l_{\infty}$ sizes on a reasonable assumption,
namely that the training data have the same distribution as the test data. We
develop and evaluate several versions of CTT with a range of defense
capabilities, training overheads and certifiability on adversarial samples.
Against adversaries with various $l_p$ norms, CTT outperforms existing defense
methods that focus purely on improving network robustness. We show that CTT has
small false positive rates on clean test data, minimal compute overheads when
deployed, and can support complex security policies.
Related papers
- DAD++: Improved Data-free Test Time Adversarial Defense [12.606555446261668]
We propose a test time Data-free Adversarial Defense (DAD) containing detection and correction frameworks.
We conduct a wide range of experiments and ablations on several datasets and network architectures to show the efficacy of our proposed approach.
Our DAD++ gives an impressive performance against various adversarial attacks with a minimal drop in clean accuracy.
arXiv Detail & Related papers (2023-09-10T20:39:53Z) - Uncovering Adversarial Risks of Test-Time Adaptation [41.19226800089764]
Test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts.
We uncover a novel security vulnerability of TTA based on the insight that predictions on benign samples can be impacted by malicious samples in the same batch.
We propose Distribution Invading Attack (DIA), which injects a small fraction of malicious data into the test batch.
arXiv Detail & Related papers (2023-01-29T22:58:05Z) - DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers [21.741026088202126]
We propose a novel way to certify the robustness of pretrained models using only a few training samples.
Our proposed approach generates class-boundary and interpolated samples corresponding to each training sample.
We obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.
arXiv Detail & Related papers (2022-10-17T10:41:18Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - COPA: Certifying Robust Policies for Offline Reinforcement Learning
against Poisoning Attacks [49.15885037760725]
We focus on certifying the robustness of offline reinforcement learning (RL) in the presence of poisoning attacks.
We propose the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated.
We prove that some of the proposed certification methods are theoretically tight and some are NP-Complete problems.
arXiv Detail & Related papers (2022-03-16T05:02:47Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks.
To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria.
To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Improving the Certified Robustness of Neural Networks via Consistency
Regularization [25.42238710803711]
A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples.
Most of these provable defense methods treat all examples equally during training process.
In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples.
arXiv Detail & Related papers (2020-12-24T05:00:50Z) - Certified Distributional Robustness on Smoothed Classifiers [27.006844966157317]
We propose the worst-case adversarial loss over input distributions as a robustness certificate.
By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate.
arXiv Detail & Related papers (2020-10-21T13:22:25Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.