Related papers: Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

URL: http://arxiv.org/abs/2003.08937v1
Date: Thu, 19 Mar 2020 17:59:44 GMT
Title: Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates
Authors: Amin Ghiasi, Ali Shafahi and Tom Goldstein
Abstract summary: We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples.
Score: 57.52763961195292
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To deflect adversarial attacks, a range of "certified" classifiers have been proposed. In addition to labeling an image, certified classifiers produce (when possible) a certificate guaranteeing that the input image is not an $\ell_p$-bounded adversarial example. We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples. The proposed "Shadow Attack" causes certifiably robust networks to mislabel an image and simultaneously produce a "spoofed" certificate of robustness.

Related papers

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models [6.129515045488372]
Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. This paper proposes a novel certified defense technique called CrossCert.
arXiv Detail & Related papers (2024-05-13T11:54:03Z)
Counterfactual Image Generation for adversarially robust and interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples. This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake" We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z)
Uncertainty-based Detection of Adversarial Attacks in Semantic Segmentation [16.109860499330562]
We introduce an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation. We demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.
arXiv Detail & Related papers (2023-05-22T08:36:35Z)
Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples [30.42301446202426]
Our new emphCertification Aware Attack exploits certifications to produce computationally efficient norm-minimising adversarial examples. While these attacks can be used to assess the tightness of certification bounds, they also highlight that releasing certifications can paradoxically reduce security.
arXiv Detail & Related papers (2023-02-09T00:10:05Z)
Collective Robustness Certificates: Exploiting Interdependence in Graph Neural Networks [71.78900818931847]
In tasks like node classification, image segmentation, and named-entity recognition we have a classifier that simultaneously outputs multiple predictions. Existing adversarial robustness certificates consider each prediction independently and are thus overly pessimistic for such tasks. We propose the first collective robustness certificate which computes the number of predictions that are simultaneously guaranteed to remain stable under perturbation.
arXiv Detail & Related papers (2023-02-06T14:46:51Z)
Detection as Regression: Certified Object Detection by Median Smoothing [50.89591634725045]
This work is motivated by recent progress on certified classification by randomized smoothing. We obtain the first model-agnostic, training-free, and certified defense for object detection against $ell$-bounded attacks.
arXiv Detail & Related papers (2020-07-07T18:40:19Z)
Denoised Smoothing: A Provable Defense for Pretrained Classifiers [101.67773468882903]
We present a method for provably defending any pretrained image classifier against $ell_p$ adversarial attacks. This method allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones.
arXiv Detail & Related papers (2020-03-04T06:15:55Z)
(De)Randomized Smoothing for Certifiable Defense against Patch Attacks [136.79415677706612]
We introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size. Our method is related to the broad class of randomized smoothing robustness schemes. Our results effectively establish a new state-of-the-art of certifiable defense against patch attacks on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-02-25T08:39:46Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
Generating Semantic Adversarial Examples via Feature Manipulation [23.48763375455514]
We propose a more practical adversarial attack by designing structured perturbation with semantic meanings. Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes. We demonstrate the existence of a universal, image-agnostic semantic adversarial example.
arXiv Detail & Related papers (2020-01-06T06:28:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.