FAR: A General Framework for Attributional Robustness
- URL: http://arxiv.org/abs/2010.07393v2
- Date: Tue, 8 Mar 2022 13:53:52 GMT
- Title: FAR: A General Framework for Attributional Robustness
- Authors: Adam Ivankay, Ivan Girardi, Chiara Marchiori and Pascal Frossard
- Abstract summary: We define a novel framework for attributional robustness (FAR) for training models with robust attributions.
We show that FAR is a generalized, less constrained formulation of currently existing training methods.
We then propose two new instantiations of this framework, AAT and AdvAAT, that directly optimize for both robust attributions and predictions.
- Score: 42.49606659285249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attribution maps are popular tools for explaining neural networks
predictions. By assigning an importance value to each input dimension that
represents its impact towards the outcome, they give an intuitive explanation
of the decision process. However, recent work has discovered vulnerability of
these maps to imperceptible adversarial changes, which can prove critical in
safety-relevant domains such as healthcare. Therefore, we define a novel
generic framework for attributional robustness (FAR) as general problem
formulation for training models with robust attributions. This framework
consist of a generic regularization term and training objective that minimize
the maximal dissimilarity of attribution maps in a local neighbourhood of the
input. We show that FAR is a generalized, less constrained formulation of
currently existing training methods. We then propose two new instantiations of
this framework, AAT and AdvAAT, that directly optimize for both robust
attributions and predictions. Experiments performed on widely used vision
datasets show that our methods perform better or comparably to current ones in
terms of attributional robustness while being more generally applicable. We
finally show that our methods mitigate undesired dependencies between
attributional robustness and some training and estimation parameters, which
seem to critically affect other competitor methods.
Related papers
- Activate and Reject: Towards Safe Domain Generalization under Category
Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS)
It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains.
Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - A Unified Wasserstein Distributional Robustness Framework for
Adversarial Training [24.411703133156394]
This paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods.
We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework.
This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms.
arXiv Detail & Related papers (2022-02-27T19:40:29Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.