On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations
- URL: http://arxiv.org/abs/2508.10490v1
- Date: Thu, 14 Aug 2025 09:49:07 GMT
- Title: On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations
- Authors: Amir Mehrpanah, Matteo Gamba, Kevin Smith, Hossein Azizpour,
- Abstract summary: ReLU networks have sharp transitions, sometimes relying on individual pixels for predictions.<n>Existing methods, such as GradCAM, smooth these explanations by producing surrogate models at the cost of faithfulness.<n>We introduce a unifying spectral framework to systematically analyze and quantify smoothness, faithfulness, and their trade-off in explanations.
- Score: 5.528734654854472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ReLU networks, while prevalent for visual data, have sharp transitions, sometimes relying on individual pixels for predictions, making vanilla gradient-based explanations noisy and difficult to interpret. Existing methods, such as GradCAM, smooth these explanations by producing surrogate models at the cost of faithfulness. We introduce a unifying spectral framework to systematically analyze and quantify smoothness, faithfulness, and their trade-off in explanations. Using this framework, we quantify and regularize the contribution of ReLU networks to high-frequency information, providing a principled approach to identifying this trade-off. Our analysis characterizes how surrogate-based smoothing distorts explanations, leading to an ``explanation gap'' that we formally define and measure for different post-hoc methods. Finally, we validate our theoretical findings across different design choices, datasets, and ablations.
Related papers
- Disentangled representations via score-based variational autoencoders [21.955536401578616]
We present the Score-based Autoencoder for Multiscale Inference (SAMI)<n>SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process.<n>It can extract useful representations from pre-trained diffusion models with minimal additional training.
arXiv Detail & Related papers (2025-12-18T23:42:10Z) - On Spectral Properties of Gradient-based Explanation Methods [6.181300669254824]
We adopt novel probabilistic and spectral perspectives to analyze explanation methods.<n>Our study reveals a pervasive spectral bias stemming from the use of gradient, and sheds light on some common design choices.<n>We propose two remedies based on our proposed formalism: (i) a mechanism to determine a standard perturbation scale, and (ii) an aggregation method which we call SpectralLens.
arXiv Detail & Related papers (2025-08-14T12:37:22Z) - Uncertainty Quantification for Gradient-based Explanations in Neural Networks [6.9060054915724]
We propose a pipeline to ascertain the explanation uncertainty of neural networks.<n>We use this pipeline to produce explanation distributions for the CIFAR-10, FER+, and California Housing datasets.<n>We compute modified pixel insertion/deletion metrics to evaluate the quality of the generated explanations.
arXiv Detail & Related papers (2024-03-25T21:56:02Z) - Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data.
We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias.
This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Deterministic Decoupling of Global Features and its Application to Data
Analysis [0.0]
We propose a new formalism that is based on defining transformations on submanifolds.
Through these transformations we define a normalization that, we demonstrate, allows for decoupling differentiable features.
We apply this method in the original data domain and at the output of a filter bank to regression and classification problems based on global descriptors.
arXiv Detail & Related papers (2022-07-05T15:54:39Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies.
We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.