On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
- URL: http://arxiv.org/abs/2308.09381v3
- Date: Tue, 14 May 2024 07:46:16 GMT
- Title: On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
- Authors: Yi Cai, Gerhard Wunder,
- Abstract summary: This paper presents methodAbr(gradient-estimation-based explanation), an approach that produces gradient-like explanations through only query-level access.
The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations.
In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.
- Score: 9.368325306722321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents \methodAbr~(gradient-estimation-based explanation), an approach that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.
Related papers
- Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI [59.96044730204345]
We introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG)
FreeMCG serves as an improved basis for explainability of a given neural network.
We show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
arXiv Detail & Related papers (2024-11-22T11:15:14Z) - Forward Learning for Gradient-based Black-box Saliency Map Generation [25.636185607767988]
We introduce a novel framework for estimating gradients in black-box settings and generating saliency maps to interpret model decisions.
We employ the likelihood ratio method to estimate output-to-input gradients and utilize them for saliency map generation.
Experiments in black-box settings validate the effectiveness of our method, demonstrating accurate gradient estimation and explainability of generated saliency maps.
arXiv Detail & Related papers (2024-03-22T20:11:19Z) - Saliency strikes back: How filtering out high frequencies improves white-box explanations [15.328499301244708]
"White-box" methods rely on a gradient signal that is often contaminated by high-frequency artifacts.
We introduce a new approach called "FORGrad" to overcome this limitation.
Our findings show that FORGrad consistently enhances the performance of already existing white-box methods.
arXiv Detail & Related papers (2023-07-18T19:56:20Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Geometrically Guided Integrated Gradients [0.3867363075280543]
We introduce an interpretability method called "geometrically-guided integrated gradients"
Our method explores the model's dynamic behavior from multiple scaled versions of the input and captures the best possible attribution for each input.
We also propose a "model perturbation" sanity check to complement the traditionally used "model randomization" test.
arXiv Detail & Related papers (2022-06-13T05:05:43Z) - Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based
Prior [50.393092185611536]
We consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model.
Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries.
We propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging.
arXiv Detail & Related papers (2022-03-13T04:06:27Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - Rethinking Positive Aggregation and Propagation of Gradients in
Gradient-based Saliency Methods [47.999621481852266]
Saliency methods interpret the prediction of a neural network by showing the importance of input elements for that prediction.
We empirically show that two approaches for handling the gradient information, namely positive aggregation, and positive propagation, break these methods.
arXiv Detail & Related papers (2020-12-01T09:38:54Z) - There and Back Again: Revisiting Backpropagation Saliency Methods [87.40330595283969]
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample.
A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient.
We propose a single framework under which several such methods can be unified.
arXiv Detail & Related papers (2020-04-06T17:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.