Related papers: Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods

Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods

URL: http://arxiv.org/abs/2512.06665v1
Date: Sun, 07 Dec 2025 05:29:38 GMT
Title: Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods
Authors: Panagiota Kiourti, Anu Singh, Preeti Duraipandian, Weichao Zhou, Wenchao Li,
Abstract summary: This paper challenges the notion of attributional robustness that largely ignores the difference in the model's outputs.<n>We propose a new definition of similar inputs, a new robustness metric, and a novel method based on generative adversarial networks.
Score: 9.184082996211517
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new way of evaluating the robustness of attribution methods. Specifically, we propose a new definition of similar inputs, a new robustness metric, and a novel method based on generative adversarial networks to generate these inputs. In addition, we present a comprehensive evaluation with existing metrics and state-of-the-art attribution methods. Our findings highlight the need for a more objective metric that reveals the weaknesses of an attribution method rather than that of the neural network, thus providing a more accurate evaluation of the robustness of attribution methods.

Related papers

A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation.<n> Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity.<n>This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
Robustness quantification: a new method for assessing the reliability of the predictions of a classifier [0.14732811715354452]
Based on existing ideas in the field of imprecise probabilities, we present a new approach for assessing the reliability of the individual predictions of a generative probabilistic classifier.<n>We call this approach robustness quantification, compare it to uncertainty quantification, and demonstrate that it continues to work well even for classifiers that are learned from small training sets that are sampled from a shifted distribution.
arXiv Detail & Related papers (2025-03-28T13:32:10Z)
Advancing Attribution-Based Neural Network Explainability through Relative Absolute Magnitude Layer-Wise Relevance Propagation and Multi-Component Evaluation [0.0]
We introduce a novel method for determining the relevance of input neurons through layer-wise relevance propagation.<n>Our results clearly demonstrate the advantage of our proposed method.<n>We propose a new evaluation metric that combines the notions of faithfulness, robustness and contrastiveness.
arXiv Detail & Related papers (2024-12-12T14:25:56Z)
Exploring Cross-model Neuronal Correlations in the Context of Predicting Model Performance and Generalizability [2.6708879445664584]
This paper introduces a novel approach for assessing a newly trained model's performance based on another known model.<n>The proposed method evaluates correlations by determining if, for each neuron in one network, there exists a neuron in the other network that produces similar output.
arXiv Detail & Related papers (2024-08-15T22:57:39Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
Boosting Adversarial Robustness using Feature Level Stochastic Smoothing [46.86097477465267]
adversarial defenses have led to a significant improvement in the robustness of Deep Neural Networks. In this work, we propose a generic method for introducingity in the network predictions. We also utilize this for smoothing decision rejecting low confidence predictions.
arXiv Detail & Related papers (2023-06-10T15:11:24Z)
A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks. We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels. We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z)
Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network. Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z)
$\eta$-Variational Classifiers Under Attack [6.574517227976925]
It is possible to synthesise small adversarial perturbations that imperceptibly modify a correctly classified input data, making the network confidently misclassify it. This has led to a plethora of different methods to try to improve robustness or detect the presence of these perturbations. We study their robustness and detection capabilities, together with some novel insights on the generative part of the model.
arXiv Detail & Related papers (2020-08-20T14:57:22Z)
A general framework for defining and optimizing robustness [74.67016173858497]
We propose a rigorous and flexible framework for defining different types of robustness properties for classifiers. Our concept is based on postulates that robustness of a classifier should be considered as a property that is independent of accuracy. We develop a very general robustness framework that is applicable to any type of classification model.
arXiv Detail & Related papers (2020-06-19T13:24:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.