Related papers: Improving Interpretability via Regularization of Neural Activation Sensitivity

Improving Interpretability via Regularization of Neural Activation Sensitivity

URL: http://arxiv.org/abs/2211.08686v1
Date: Wed, 16 Nov 2022 05:40:29 GMT
Title: Improving Interpretability via Regularization of Neural Activation Sensitivity
Authors: Ofir Moshe, Gil Fidel, Ron Bitton, Asaf Shabtai
Abstract summary: State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. They are susceptible to adversarial attacks and their opaqueness impedes users' trust in their output. We present a novel approach for improving the interpretability of DNNs based on regularization of neural activation sensitivity.
Score: 20.407987149443997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their wide adoption in mission-critical contexts is hampered by two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about the security and generalization of DNNs in real-world conditions, whereas the latter impedes users' trust in their output. In this research, we (1) examine the effect of adversarial robustness on interpretability and (2) present a novel approach for improving the interpretability of DNNs that is based on regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.

Related papers

Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection [0.0]
Interpretability is as essential as robustness when we deploy the models to the real world. Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans.
arXiv Detail & Related papers (2023-07-04T13:51:55Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs) In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR) Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z)
Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training. We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z)
On the Properties of Adversarially-Trained CNNs [4.769747792846005]
Adversarial Training has proved to be an effective training paradigm to enforce robustness against adversarial examples in modern neural network architectures. We describe surprising properties of adversarially-trained models, shedding light on mechanisms through which robustness against adversarial attacks is implemented.
arXiv Detail & Related papers (2022-03-17T11:11:52Z)
AED: An black-box NLP classifier model attacker [8.15167980163668]
Deep Neural Networks (DNNs) have been successful in solving real-world tasks in domains such as connected and automated vehicles, disease, and job hiring. There is a growing concern regarding the potential bias and robustness of these DNN models. We propose a word-level NLP classifier attack model called "AED," which stands for Attention mechanism enabled post-model Explanation.
arXiv Detail & Related papers (2021-12-22T04:25:23Z)
Balancing Robustness and Sensitivity using Feature Contrastive Learning [95.86909855412601]
Methods that promote robustness can hurt the model's sensitivity to rare or underrepresented patterns. We propose Feature Contrastive Learning (FCL) that encourages a model to be more sensitive to the features that have higher contextual utility.
arXiv Detail & Related papers (2021-05-19T20:53:02Z)
On the benefits of robust models in modulation recognition [53.391095789289736]
Deep Neural Networks (DNNs) using convolutional layers are state-of-the-art in many tasks in communications. In other domains, like image classification, DNNs have been shown to be vulnerable to adversarial perturbations. We propose a novel framework to test the robustness of current state-of-the-art models.
arXiv Detail & Related papers (2021-03-27T19:58:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.