Towards Certifiable Adversarial Sample Detection
- URL: http://arxiv.org/abs/2002.08740v1
- Date: Thu, 20 Feb 2020 14:10:00 GMT
- Title: Towards Certifiable Adversarial Sample Detection
- Authors: Ilia Shumailov, Yiren Zhao, Robert Mullins, Ross Anderson
- Abstract summary: Convolutional Neural Networks (CNNs) are deployed in more and more classification systems.
adversarial samples can be maliciously crafted to trick them, and are becoming a real threat.
We provide a new approach in the form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap (CTT)
We show that CTT has small false positive rates on clean test data, minimal compute overheads when deployed, and can support complex security policies.
- Score: 9.29621938988162
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Convolutional Neural Networks (CNNs) are deployed in more and more
classification systems, but adversarial samples can be maliciously crafted to
trick them, and are becoming a real threat. There have been various proposals
to improve CNNs' adversarial robustness but these all suffer performance
penalties or other limitations. In this paper, we provide a new approach in the
form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap
(CTT). The system can provide certifiable guarantees of detection of
adversarial inputs for certain $l_{\infty}$ sizes on a reasonable assumption,
namely that the training data have the same distribution as the test data. We
develop and evaluate several versions of CTT with a range of defense
capabilities, training overheads and certifiability on adversarial samples.
Against adversaries with various $l_p$ norms, CTT outperforms existing defense
methods that focus purely on improving network robustness. We show that CTT has
small false positive rates on clean test data, minimal compute overheads when
deployed, and can support complex security policies.
Related papers
- Cert-SSB: Toward Certified Sample-Specific Backdoor Defense [11.881041521642405]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Hackers manipulate a small portion of the training data to implant hidden backdoors into the model.
We propose a sample-specific certified backdoor defense method, termed Cert- SSB.
arXiv Detail & Related papers (2025-04-30T15:21:25Z) - Exact Certification of (Graph) Neural Networks Against Label Poisoning [50.87615167799367]
We introduce an exact certification method for label flipping in Graph Neural Networks (GNNs)
We apply our method to certify a broad range of GNN architectures in node classification tasks.
Our work presents the first exact certificate to a poisoning attack ever derived for neural networks.
arXiv Detail & Related papers (2024-11-30T17:05:12Z) - TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models [53.91006249339802]
We propose a novel defense method called Test-Time Adversarial Prompt Tuning (TAPT) to enhance the inference robustness of CLIP against visual adversarial attacks.
TAPT is a test-time defense method that learns defensive bimodal (textual and visual) prompts to robustify the inference process of CLIP.
We evaluate the effectiveness of TAPT on 11 benchmark datasets, including ImageNet and 10 other zero-shot datasets.
arXiv Detail & Related papers (2024-11-20T08:58:59Z) - Uncovering Adversarial Risks of Test-Time Adaptation [41.19226800089764]
Test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts.
We uncover a novel security vulnerability of TTA based on the insight that predictions on benign samples can be impacted by malicious samples in the same batch.
We propose Distribution Invading Attack (DIA), which injects a small fraction of malicious data into the test batch.
arXiv Detail & Related papers (2023-01-29T22:58:05Z) - DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers [21.741026088202126]
We propose a novel way to certify the robustness of pretrained models using only a few training samples.
Our proposed approach generates class-boundary and interpolated samples corresponding to each training sample.
We obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.
arXiv Detail & Related papers (2022-10-17T10:41:18Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Getting a-Round Guarantees: Floating-Point Attacks on Certified Robustness [19.380453459873298]
Adversarial examples pose a security risk as they can alter decisions of a machine learning classifier through slight input perturbations.
We show that these guarantees can be invalidated due to limitations of floating-point representation that cause rounding errors.
We show that the attack can be carried out against linear classifiers that have exact certifiable guarantees and against neural networks that have conservative certifications.
arXiv Detail & Related papers (2022-05-20T13:07:36Z) - COPA: Certifying Robust Policies for Offline Reinforcement Learning
against Poisoning Attacks [49.15885037760725]
We focus on certifying the robustness of offline reinforcement learning (RL) in the presence of poisoning attacks.
We propose the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated.
We prove that some of the proposed certification methods are theoretically tight and some are NP-Complete problems.
arXiv Detail & Related papers (2022-03-16T05:02:47Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks.
To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria.
To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z) - Improving the Certified Robustness of Neural Networks via Consistency
Regularization [25.42238710803711]
A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples.
Most of these provable defense methods treat all examples equally during training process.
In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples.
arXiv Detail & Related papers (2020-12-24T05:00:50Z) - Certified Distributional Robustness on Smoothed Classifiers [27.006844966157317]
We propose the worst-case adversarial loss over input distributions as a robustness certificate.
By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate.
arXiv Detail & Related papers (2020-10-21T13:22:25Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.