Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance
- URL: http://arxiv.org/abs/2304.06715v3
- Date: Thu, 5 Oct 2023 15:29:01 GMT
- Title: Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance
- Authors: Jonathan Crabb\'e, Mihaela van der Schaar
- Abstract summary: Interpretability methods are valuable only if their explanations faithfully describe the explained model.
We consider neural networks whose predictions are invariant under a specific symmetry group.
- Score: 72.50214227616728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretability methods are valuable only if their explanations faithfully
describe the explained model. In this work, we consider neural networks whose
predictions are invariant under a specific symmetry group. This includes
popular architectures, ranging from convolutional to graph neural networks. Any
explanation that faithfully explains this type of model needs to be in
agreement with this invariance property. We formalize this intuition through
the notion of explanation invariance and equivariance by leveraging the
formalism from geometric deep learning. Through this rigorous formalism, we
derive (1) two metrics to measure the robustness of any interpretability method
with respect to the model symmetry group; (2) theoretical robustness guarantees
for some popular interpretability methods and (3) a systematic approach to
increase the invariance of any interpretability method with respect to a
symmetry group. By empirically measuring our metrics for explanations of models
associated with various modalities and symmetry groups, we derive a set of 5
guidelines to allow users and developers of interpretability methods to produce
robust explanations.
Related papers
- Counterfactual explainability of black-box prediction models [4.14360329494344]
We propose a new notion called counterfactual explainability for black-box prediction models.
Counterfactual explainability has three key advantages.
arXiv Detail & Related papers (2024-11-03T16:29:09Z) - Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation [0.9558392439655016]
The ability to interpret Machine Learning (ML) models is becoming increasingly essential.
Recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models.
arXiv Detail & Related papers (2024-08-07T17:20:52Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - Symmetry Breaking and Equivariant Neural Networks [17.740760773905986]
We introduce a novel notion of'relaxed equiinjection'
We show how to incorporate this relaxation into equivariant multilayer perceptronrons (E-MLPs)
The relevance of symmetry breaking is then discussed in various application domains.
arXiv Detail & Related papers (2023-12-14T15:06:48Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - In What Ways Are Deep Neural Networks Invariant and How Should We
Measure This? [5.757836174655293]
We introduce a family of invariance and equivariance metrics that allows us to quantify these properties in a way that disentangles them from other metrics such as loss or accuracy.
We draw a range of conclusions about invariance and equivariance in deep learning models, ranging from whether initializing a model with pretrained weights has an effect on a trained model's invariance, to the extent to which invariance learned via training can generalize to out-of-distribution data.
arXiv Detail & Related papers (2022-10-07T18:43:21Z) - Equivariant Representation Learning via Class-Pose Decomposition [17.032782230538388]
We introduce a general method for learning representations that are equivariant to symmetries of data.
The components semantically correspond to intrinsic data classes and poses respectively.
Results show that our representations capture the geometry of data and outperform other equivariant representation learning frameworks.
arXiv Detail & Related papers (2022-07-07T06:55:52Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.