Denoised Smoothing: A Provable Defense for Pretrained Classifiers
- URL: http://arxiv.org/abs/2003.01908v2
- Date: Mon, 21 Sep 2020 02:20:16 GMT
- Title: Denoised Smoothing: A Provable Defense for Pretrained Classifiers
- Authors: Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor and J. Zico Kolter
- Abstract summary: We present a method for provably defending any pretrained image classifier against $ell_p$ adversarial attacks.
This method allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones.
- Score: 101.67773468882903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method for provably defending any pretrained image classifier
against $\ell_p$ adversarial attacks. This method, for instance, allows public
vision API providers and users to seamlessly convert pretrained non-robust
classification services into provably robust ones. By prepending a
custom-trained denoiser to any off-the-shelf image classifier and using
randomized smoothing, we effectively create a new classifier that is guaranteed
to be $\ell_p$-robust to adversarial examples, without modifying the pretrained
classifier. Our approach applies to both the white-box and the black-box
settings of the pretrained classifier. We refer to this defense as denoised
smoothing, and we demonstrate its effectiveness through extensive
experimentation on ImageNet and CIFAR-10. Finally, we use our approach to
provably defend the Azure, Google, AWS, and ClarifAI image classification APIs.
Our code replicating all the experiments in the paper can be found at:
https://github.com/microsoft/denoised-smoothing.
Related papers
- ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks.
We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models [38.019489232264796]
We propose FCert, the first certified defense against data poisoning attacks to few-shot classification.
Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing certified defenses for data poisoning attacks, and 3) is efficient and general.
arXiv Detail & Related papers (2024-04-12T17:50:40Z) - Rethinking Multiple Instance Learning for Whole Slide Image
Classification: A Bag-Level Classifier is a Good Instance-Level Teacher [22.080213609228547]
Multiple Instance Learning has demonstrated promise in Whole Slide Image (WSI) classification.
Existing methods generally adopt a two-stage approach, comprising a non-learnable feature embedding stage and a classifier training stage.
We propose that a bag-level classifier can be a good instance-level teacher.
arXiv Detail & Related papers (2023-12-02T10:16:03Z) - Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences.
Our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z) - Exploring the Limits of Deep Image Clustering using Pretrained Models [1.1060425537315088]
We present a methodology that learns to classify images without labels by leveraging pretrained feature extractors.
We propose a novel objective that learns associations between image features by introducing a variant of pointwise mutual information together with instance weighting.
arXiv Detail & Related papers (2023-03-31T08:56:29Z) - Assessing Neural Network Robustness via Adversarial Pivotal Tuning [24.329515700515806]
We show how a pretrained image generator can be used to semantically manipulate images in a detailed, diverse, and inversion way.
Inspired by recent GAN-based photorealistic methods, we propose a method called Adversarial Pivotal Tuning (APT)
We demonstrate that APT is capable of a wide range of class-preserving semantic image manipulations that fool a variety of pretrained classifiers.
arXiv Detail & Related papers (2022-11-17T18:54:35Z) - Almost Tight L0-norm Certified Robustness of Top-k Predictions against
Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches.
Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input.
For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.