Understanding and Unifying Fourteen Attribution Methods with Taylor
Interactions
- URL: http://arxiv.org/abs/2303.01506v2
- Date: Mon, 6 Mar 2023 03:41:17 GMT
- Title: Understanding and Unifying Fourteen Attribution Methods with Taylor
Interactions
- Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Ziwei Yang,
Zheyang Li, and Quanshi Zhang
- Abstract summary: Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output.
There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related.
We prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects.
- Score: 34.94946455284657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various attribution methods have been developed to explain deep neural
networks (DNNs) by inferring the attribution/importance/contribution score of
each input variable to the final output. However, existing attribution methods
are often built upon different heuristics. There remains a lack of a unified
theoretical understanding of why these methods are effective and how they are
related. To this end, for the first time, we formulate core mechanisms of
fourteen attribution methods, which were designed on different heuristics, into
the same mathematical system, i.e., the system of Taylor interactions.
Specifically, we prove that attribution scores estimated by fourteen
attribution methods can all be reformulated as the weighted sum of two types of
effects, i.e., independent effects of each individual input variable and
interaction effects between input variables. The essential difference among the
fourteen attribution methods mainly lies in the weights of allocating different
effects. Based on the above findings, we propose three principles for a fair
allocation of effects to evaluate the faithfulness of the fourteen attribution
methods.
Related papers
- Unifying Causal Representation Learning with the Invariance Principle [21.375611599649716]
Causal representation learning aims at recovering latent causal variables from high-dimensional observations.
Our main contribution is to show that many existing causal representation learning approaches methodologically align the representation to known data symmetries.
arXiv Detail & Related papers (2024-09-04T14:51:36Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - The Weighted M\"obius Score: A Unified Framework for Feature Attribution [17.358276581599643]
Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction.
The lack of a unified framework has led to a proliferation of methods that are often not directly comparable.
This paper introduces a parameterized attribution framework -- the Weighted M"obius Score -- and shows that many different attribution methods for both individual features and feature interactions are special cases.
arXiv Detail & Related papers (2023-05-16T06:27:27Z) - Disentangled Representation for Causal Mediation Analysis [25.114619307838602]
Causal mediation analysis is a method that is often used to reveal direct and indirect effects.
Deep learning shows promise in mediation analysis, but the current methods only assume latent confounders that affect treatment, mediator and outcome simultaneously.
We propose the Disentangled Mediation Analysis Variational AutoEncoder (DMAVAE), which disentangles the representations of latent confounders into three types to accurately estimate the natural direct effect, natural indirect effect and total effect.
arXiv Detail & Related papers (2023-02-19T23:37:17Z) - Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD
Training Data Estimate a Combination of the Same Core Quantities [104.02531442035483]
The goal of this paper is to recognize common objectives as well as to identify the implicit scoring functions of different OOD detection methods.
We show that binary discrimination between in- and (different) out-distributions is equivalent to several distinct formulations of the OOD detection problem.
We also show that the confidence loss which is used by Outlier Exposure has an implicit scoring function which differs in a non-trivial fashion from the theoretically optimal scoring function.
arXiv Detail & Related papers (2022-06-20T16:32:49Z) - A General Taylor Framework for Unifying and Revisiting Attribution
Methods [36.34893316038053]
We propose a Taylor attribution framework, which models the attribution problem as how to decide individual payoffs in a coalition.
We establish three principles for a good attribution in the Taylor attribution framework.
arXiv Detail & Related papers (2021-05-28T13:57:16Z) - Learning Causal Semantic Representation for Out-of-Distribution
Prediction [125.38836464226092]
We propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately.
We show that CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error.
arXiv Detail & Related papers (2020-11-03T13:16:05Z) - A Unified Taylor Framework for Revisiting Attribution Methods [49.03783992773811]
We propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework.
We establish three principles for a good attribution in the Taylor attribution framework.
arXiv Detail & Related papers (2020-08-21T22:07:06Z) - Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer [89.21584915290319]
We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
arXiv Detail & Related papers (2020-04-23T14:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.