Respect the model: Fine-grained and Robust Explanation with Sharing
Ratio Decomposition
- URL: http://arxiv.org/abs/2402.03348v1
- Date: Thu, 25 Jan 2024 07:20:23 GMT
- Title: Respect the model: Fine-grained and Robust Explanation with Sharing
Ratio Decomposition
- Authors: Sangyu Han, Yearim Kim, Nojun Kwak
- Abstract summary: We propose a novel eXplainable AI (XAI) method called SRD (Sharing Ratio Decomposition), which sincerely reflects the model's inference process.
We also introduce an interesting observation termed Activation-Pattern-Only Prediction (APOP), letting us emphasize the importance of inactive neurons.
- Score: 29.491712784788188
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The truthfulness of existing explanation methods in authentically elucidating
the underlying model's decision-making process has been questioned. Existing
methods have deviated from faithfully representing the model, thus susceptible
to adversarial attacks. To address this, we propose a novel eXplainable AI
(XAI) method called SRD (Sharing Ratio Decomposition), which sincerely reflects
the model's inference process, resulting in significantly enhanced robustness
in our explanations. Different from the conventional emphasis on the neuronal
level, we adopt a vector perspective to consider the intricate nonlinear
interactions between filters. We also introduce an interesting observation
termed Activation-Pattern-Only Prediction (APOP), letting us emphasize the
importance of inactive neurons and redefine relevance encapsulating all
relevant information including both active and inactive neurons. Our method,
SRD, allows for the recursive decomposition of a Pointwise Feature Vector
(PFV), providing a high-resolution Effective Receptive Field (ERF) at any
layer.
Related papers
- Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models.
We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z) - Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression [12.44857030152608]
Deep Neural Networks are prone to learning and relying on spurious correlations in the training data, which, for high-risk applications, can have fatal consequences.
Various approaches to suppress model reliance on harmful features have been proposed that can be applied post-hoc without additional training.
We propose a reactive approach conditioned on model-derived knowledge and eXplainable Artificial Intelligence (XAI) insights.
arXiv Detail & Related papers (2024-04-15T09:16:49Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - ODE-based Recurrent Model-free Reinforcement Learning for POMDPs [15.030970899252601]
We present a novel ODE-based recurrent model combines with model-free reinforcement learning framework to solve POMDPs.
We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks.
Our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
arXiv Detail & Related papers (2023-09-25T12:13:56Z) - Convex Latent-Optimized Adversarial Regularizers for Imaging Inverse
Problems [8.33626757808923]
We introduce Convex Latent-d Adrial Regularizers (CLEAR), a novel and interpretable data-driven paradigm.
CLEAR represents a fusion of deep learning (DL) and variational regularization.
Our method consistently outperforms conventional data-driven techniques and traditional regularization approaches.
arXiv Detail & Related papers (2023-09-17T12:06:04Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for
sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications.
We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem.
The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z) - Interpreting Deep Neural Networks with Relative Sectional Propagation by
Analyzing Comparative Gradients and Hostile Activations [37.11665902583138]
We propose a new attribution method, Relative Sectional Propagation (RSP), for decomposing the output predictions of Deep Neural Networks (DNNs)
We define hostile factor as an element that interferes with finding the attributions of the target and propagates it in a distinguishable way to overcome the non-suppressed nature of activated neurons.
Our method makes it possible to decompose the predictions of DNNs with clearer class-discriminativeness and detailed elucidations of activation neurons compared to the conventional attribution methods.
arXiv Detail & Related papers (2020-12-07T03:11:07Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.