Related papers: Denoised Smoothing: A Provable Defense for Pretrained Classifiers

Denoised Smoothing: A Provable Defense for Pretrained Classifiers

URL: http://arxiv.org/abs/2003.01908v2
Date: Mon, 21 Sep 2020 02:20:16 GMT
Title: Denoised Smoothing: A Provable Defense for Pretrained Classifiers
Authors: Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor and J. Zico Kolter
Abstract summary: We present a method for provably defending any pretrained image classifier against $ell_p$ adversarial attacks. This method allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones.
Score: 101.67773468882903
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at: https://github.com/microsoft/denoised-smoothing.

Related papers

ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks. We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z)
Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z)
FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models [38.019489232264796]
We propose FCert, the first certified defense against data poisoning attacks to few-shot classification. Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing certified defenses for data poisoning attacks, and 3) is efficient and general.
arXiv Detail & Related papers (2024-04-12T17:50:40Z)
Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Bag-Level Classifier is a Good Instance-Level Teacher [22.080213609228547]
Multiple Instance Learning has demonstrated promise in Whole Slide Image (WSI) classification. Existing methods generally adopt a two-stage approach, comprising a non-learnable feature embedding stage and a classifier training stage. We propose that a bag-level classifier can be a good instance-level teacher.
arXiv Detail & Related papers (2023-12-02T10:16:03Z)
Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences. Our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z)
Exploring the Limits of Deep Image Clustering using Pretrained Models [1.1060425537315088]
We present a methodology that learns to classify images without labels by leveraging pretrained feature extractors. We propose a novel objective that learns associations between image features by introducing a variant of pointwise mutual information together with instance weighting.
arXiv Detail & Related papers (2023-03-31T08:56:29Z)
Assessing Neural Network Robustness via Adversarial Pivotal Tuning [24.329515700515806]
We show how a pretrained image generator can be used to semantically manipulate images in a detailed, diverse, and inversion way. Inspired by recent GAN-based photorealistic methods, we propose a method called Adversarial Pivotal Tuning (APT) We demonstrate that APT is capable of a wide range of class-preserving semantic image manipulations that fool a variety of pretrained classifiers.
arXiv Detail & Related papers (2022-11-17T18:54:35Z)
Almost Tight L0-norm Certified Robustness of Top-k Predictions against Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches. Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input. For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z)
Learning and Evaluating Representations for Deep One-class Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled. A self-supervised task from representation learning is employed to obtain semantically meaningful features. We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.