Saliency strikes back: How filtering out high frequencies improves white-box explanations
- URL: http://arxiv.org/abs/2307.09591v4
- Date: Fri, 7 Jun 2024 18:39:49 GMT
- Title: Saliency strikes back: How filtering out high frequencies improves white-box explanations
- Authors: Sabine Muzellec, Thomas Fel, Victor Boutin, Léo andéol, Rufin VanRullen, Thomas Serre,
- Abstract summary: "White-box" methods rely on a gradient signal that is often contaminated by high-frequency artifacts.
We introduce a new approach called "FORGrad" to overcome this limitation.
Our findings show that FORGrad consistently enhances the performance of already existing white-box methods.
- Score: 15.328499301244708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model's decision-making process. We have identified a significant limitation in one type of attribution methods, known as ``white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by high-frequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.
Related papers
- MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search [20.48306648223519]
We propose a novel black-box individual fairness testing method called Model-Agnostic Fairness Testing (MAFT)
We demonstrate that MAFT achieves the same effectiveness as state-of-the-art white-box methods whilst improving the applicability to large-scale networks.
arXiv Detail & Related papers (2024-12-28T09:07:06Z) - On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box [9.368325306722321]
This paper presents methodAbr(gradient-estimation-based explanation), an approach that produces gradient-like explanations through only query-level access.
The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations.
In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.
arXiv Detail & Related papers (2023-08-18T08:24:57Z) - Sampling-based Fast Gradient Rescaling Method for Highly Transferable
Adversarial Attacks [18.05924632169541]
We propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM)
Specifically, we use data rescaling to substitute the sign function without extra computational cost.
Our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.
arXiv Detail & Related papers (2023-07-06T07:52:42Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based
Prior [50.393092185611536]
We consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model.
Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries.
We propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging.
arXiv Detail & Related papers (2022-03-13T04:06:27Z) - Time to Focus: A Comprehensive Benchmark Using Time Series Attribution
Methods [4.9449660544238085]
The paper focuses on time series analysis and benchmark several state-of-the-art attribution methods.
The presented experiments involve gradient-based and perturbation-based attribution methods.
The findings accentuate that choosing the best-suited attribution method is strongly correlated with the desired use case.
arXiv Detail & Related papers (2022-02-08T10:06:13Z) - Staircase Sign Method for Boosting Adversarial Attacks [123.19227129979943]
Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.
We propose a novel Staircase Sign Method (S$2$M) to alleviate this issue, thus boosting transfer-based attacks.
Our method can be generally integrated into any transfer-based attacks, and the computational overhead is negligible.
arXiv Detail & Related papers (2021-04-20T02:31:55Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - There and Back Again: Revisiting Backpropagation Saliency Methods [87.40330595283969]
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample.
A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient.
We propose a single framework under which several such methods can be unified.
arXiv Detail & Related papers (2020-04-06T17:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.