Certified Interpretability Robustness for Class Activation Mapping
- URL: http://arxiv.org/abs/2301.11324v1
- Date: Thu, 26 Jan 2023 18:58:11 GMT
- Title: Certified Interpretability Robustness for Class Activation Mapping
- Authors: Alex Gu, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel
- Abstract summary: We present CORGI, short for Certifiably prOvable Robustness Guarantees for Interpretability mapping.
CORGI is an algorithm that takes in an input image and gives a certifiable lower bound for the robustness of its CAM interpretability map.
We show the effectiveness of CORGI via a case study on traffic sign data, certifying lower bounds on the minimum adversarial perturbation.
- Score: 77.58769591550225
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Interpreting machine learning models is challenging but crucial for ensuring
the safety of deep networks in autonomous driving systems. Due to the
prevalence of deep learning based perception models in autonomous vehicles,
accurately interpreting their predictions is crucial. While a variety of such
methods have been proposed, most are shown to lack robustness. Yet, little has
been done to provide certificates for interpretability robustness. Taking a
step in this direction, we present CORGI, short for Certifiably prOvable
Robustness Guarantees for Interpretability mapping. CORGI is an algorithm that
takes in an input image and gives a certifiable lower bound for the robustness
of the top k pixels of its CAM interpretability map. We show the effectiveness
of CORGI via a case study on traffic sign data, certifying lower bounds on the
minimum adversarial perturbation not far from (4-5x) state-of-the-art attack
methods.
Related papers
- Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing [87.48628403354351]
certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions.
Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty.
We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components.
arXiv Detail & Related papers (2024-02-13T11:59:43Z) - Improving Adversarial Robustness of Masked Autoencoders via Test-time
Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE)
A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods.
We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z) - Learning to Generate Training Datasets for Robust Semantic Segmentation [37.9308918593436]
We propose a novel approach to improve the robustness of semantic segmentation techniques.
We design Robusta, a novel conditional generative adversarial network to generate realistic and plausible perturbed images.
Our results suggest that this approach could be valuable in safety-critical applications.
arXiv Detail & Related papers (2023-08-01T10:02:26Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Assurance Monitoring of Learning Enabled Cyber-Physical Systems Using
Inductive Conformal Prediction based on Distance Learning [2.66512000865131]
We propose an approach for assurance monitoring of learning-enabled Cyber-Physical Systems.
In order to allow real-time assurance monitoring, the approach employs distance learning to transform high-dimensional inputs into lower size embedding representations.
We demonstrate the approach using three data sets of mobile robot following a wall, speaker recognition, and traffic sign recognition.
arXiv Detail & Related papers (2021-10-07T00:21:45Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Adversarial Robustness of Supervised Sparse Coding [34.94566482399662]
We consider a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate.
We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear encoder.
We provide a robustness certificate for end-to-end classification.
arXiv Detail & Related papers (2020-10-22T22:05:21Z) - Certified Distributional Robustness on Smoothed Classifiers [27.006844966157317]
We propose the worst-case adversarial loss over input distributions as a robustness certificate.
By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate.
arXiv Detail & Related papers (2020-10-21T13:22:25Z) - Certifiable Robustness to Adversarial State Uncertainty in Deep
Reinforcement Learning [40.989393438716476]
Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness.
Small perturbations to sensor inputs are often enough to change network-based decisions, which was recently shown to cause an autonomous vehicle to swerve into another lane.
This work leverages research on certified adversarial robustness to develop an online certifiably robust for deep reinforcement learning algorithms.
arXiv Detail & Related papers (2020-04-11T21:36:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.