Related papers: Topological Representations of Local Explanations

Topological Representations of Local Explanations

URL: http://arxiv.org/abs/2201.02155v1
Date: Thu, 6 Jan 2022 17:46:45 GMT
Title: Topological Representations of Local Explanations
Authors: Peter Xenopoulos, Gromit Chan, Harish Doraiswamy, Luis Gustavo Nonato, Brian Barr, Claudio Silva
Abstract summary: We propose a topology-based framework to extract a simplified representation from a set of local explanations. We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations.
Score: 8.559625821116454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Local explainability methods -- those which seek to generate an explanation for each prediction -- are becoming increasingly prevalent due to the need for practitioners to rationalize their model outputs. However, comparing local explainability methods is difficult since they each generate outputs in various scales and dimensions. Furthermore, due to the stochastic nature of some explainability methods, it is possible for different runs of a method to produce contradictory explanations for a given observation. In this paper, we propose a topology-based framework to extract a simplified representation from a set of local explanations. We do so by first modeling the relationship between the explanation space and the model predictions as a scalar function. Then, we compute the topological skeleton of this function. This topological skeleton acts as a signature for such functions, which we use to compare different explanation methods. We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations. Then, we show how our framework can be used to identify appropriate parameters for local explainability methods. Our framework is simple, does not require complex optimizations, and can be broadly applied to most local explanation methods. We believe the practicality and versatility of our approach will help promote topology-based approaches as a tool for understanding and comparing explanation methods.

Related papers

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) In this work we bring forward empirical evidence that challenges this very notion. We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z)
Unifying Attribution-Based Explanations Using Functional Decomposition [1.8216507818880976]
We propose a unifying framework of attribution-based explanation methods. It provides a step towards a rigorous study of the similarities and differences of explanations.
arXiv Detail & Related papers (2024-12-18T09:04:07Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model. We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z)
Understanding Post-hoc Explainers: The Case of Anchors [6.681943980068051]
We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision. After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
arXiv Detail & Related papers (2023-03-15T17:56:34Z)
The Shape of Explanations: A Topological Account of Rule-Based Explanations in Machine Learning [0.0]
We introduce a framework for rule-based explanation methods and provide a characterization of explainability. We argue that the preferred scheme depends on how much the user knows about the domain and the probability measure over the feature space.
arXiv Detail & Related papers (2023-01-22T02:58:00Z)
Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning. We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z)
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations [16.678003262147346]
We show that popular explanation methods are instances of the local function approximation (LFA) framework. We set forth a guiding principle based on the function approximation perspective, considering a method to be effective if it recovers the underlying model. We empirically validate our theoretical results using various real world datasets, model classes, and prediction tasks.
arXiv Detail & Related papers (2022-06-02T19:09:30Z)
Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem. We show the link between the robustness of ensemble models and the robustness of base learners. Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z)
Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning [15.886405745163234]
We propose a model agnostic local explanation method inspired by the invariant risk minimization principle. Our algorithm is simple and efficient to train, and can ascertain stable input features for local decisions of a black-box without access to side information.
arXiv Detail & Related papers (2022-01-28T14:29:25Z)
Explaining by Removing: A Unified Framework for Model Explanation [14.50261153230204]
Removal-based explanations are based on the principle of simulating feature removal to quantify each feature's influence. We develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature.
arXiv Detail & Related papers (2020-11-21T00:47:48Z)
Towards Interpretable Natural Language Understanding with Explanations as Latent Variables [146.83882632854485]
We develop a framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training. Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model.
arXiv Detail & Related papers (2020-10-24T02:05:56Z)
Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies. We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.