Topological Representations of Local Explanations
- URL: http://arxiv.org/abs/2201.02155v1
- Date: Thu, 6 Jan 2022 17:46:45 GMT
- Title: Topological Representations of Local Explanations
- Authors: Peter Xenopoulos, Gromit Chan, Harish Doraiswamy, Luis Gustavo Nonato,
Brian Barr, Claudio Silva
- Abstract summary: We propose a topology-based framework to extract a simplified representation from a set of local explanations.
We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations.
- Score: 8.559625821116454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local explainability methods -- those which seek to generate an explanation
for each prediction -- are becoming increasingly prevalent due to the need for
practitioners to rationalize their model outputs. However, comparing local
explainability methods is difficult since they each generate outputs in various
scales and dimensions. Furthermore, due to the stochastic nature of some
explainability methods, it is possible for different runs of a method to
produce contradictory explanations for a given observation. In this paper, we
propose a topology-based framework to extract a simplified representation from
a set of local explanations. We do so by first modeling the relationship
between the explanation space and the model predictions as a scalar function.
Then, we compute the topological skeleton of this function. This topological
skeleton acts as a signature for such functions, which we use to compare
different explanation methods. We demonstrate that our framework can not only
reliably identify differences between explainability techniques but also
provides stable representations. Then, we show how our framework can be used to
identify appropriate parameters for local explainability methods. Our framework
is simple, does not require complex optimizations, and can be broadly applied
to most local explanation methods. We believe the practicality and versatility
of our approach will help promote topology-based approaches as a tool for
understanding and comparing explanation methods.
Related papers
- Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model.
We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z) - Understanding Post-hoc Explainers: The Case of Anchors [6.681943980068051]
We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision.
After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
arXiv Detail & Related papers (2023-03-15T17:56:34Z) - The Shape of Explanations: A Topological Account of Rule-Based
Explanations in Machine Learning [0.0]
We introduce a framework for rule-based explanation methods and provide a characterization of explainability.
We argue that the preferred scheme depends on how much the user knows about the domain and the probability measure over the feature space.
arXiv Detail & Related papers (2023-01-22T02:58:00Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - Which Explanation Should I Choose? A Function Approximation Perspective
to Characterizing Post hoc Explanations [16.678003262147346]
We show that popular explanation methods are instances of the local function approximation (LFA) framework.
We set forth a guiding principle based on the function approximation perspective, considering a method to be effective if it recovers the underlying model.
We empirically validate our theoretical results using various real world datasets, model classes, and prediction tasks.
arXiv Detail & Related papers (2022-06-02T19:09:30Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Locally Invariant Explanations: Towards Stable and Unidirectional
Explanations through Local Invariant Learning [15.886405745163234]
We propose a model agnostic local explanation method inspired by the invariant risk minimization principle.
Our algorithm is simple and efficient to train, and can ascertain stable input features for local decisions of a black-box without access to side information.
arXiv Detail & Related papers (2022-01-28T14:29:25Z) - Explaining by Removing: A Unified Framework for Model Explanation [14.50261153230204]
Removal-based explanations are based on the principle of simulating feature removal to quantify each feature's influence.
We develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence.
This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature.
arXiv Detail & Related papers (2020-11-21T00:47:48Z) - Towards Interpretable Natural Language Understanding with Explanations
as Latent Variables [146.83882632854485]
We develop a framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model.
arXiv Detail & Related papers (2020-10-24T02:05:56Z) - Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies.
We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.