A Unified Taylor Framework for Revisiting Attribution Methods
- URL: http://arxiv.org/abs/2008.09695v3
- Date: Tue, 13 Apr 2021 09:00:51 GMT
- Title: A Unified Taylor Framework for Revisiting Attribution Methods
- Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, and Xia Hu
- Abstract summary: We propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework.
We establish three principles for a good attribution in the Taylor attribution framework.
- Score: 49.03783992773811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attribution methods have been developed to understand the decision-making
process of machine learning models, especially deep neural networks, by
assigning importance scores to individual features. Existing attribution
methods often built upon empirical intuitions and heuristics. There still lacks
a general and theoretical framework that not only can unify these attribution
methods, but also theoretically reveal their rationales, fidelity, and
limitations. To bridge the gap, in this paper, we propose a Taylor attribution
framework and reformulate seven mainstream attribution methods into the
framework. Based on reformulations, we analyze the attribution methods in terms
of rationale, fidelity, and limitation. Moreover, We establish three principles
for a good attribution in the Taylor attribution framework, i.e., low
approximation error, correct contribution assignment, and unbiased baseline
selection. Finally, we empirically validate the Taylor reformulations and
reveal a positive correlation between the attribution performance and the
number of principles followed by the attribution method via benchmarking on
real-world datasets.
Related papers
- Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods [49.62131719441252]
Attribution methods compute importance scores for input features to explain the output predictions of deep models.
In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill.
We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
arXiv Detail & Related papers (2024-05-02T13:48:37Z) - The Weighted M\"obius Score: A Unified Framework for Feature Attribution [17.358276581599643]
Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction.
The lack of a unified framework has led to a proliferation of methods that are often not directly comparable.
This paper introduces a parameterized attribution framework -- the Weighted M"obius Score -- and shows that many different attribution methods for both individual features and feature interactions are special cases.
arXiv Detail & Related papers (2023-05-16T06:27:27Z) - The Open-World Lottery Ticket Hypothesis for OOD Intent Classification [68.93357975024773]
We shed light on the fundamental cause of model overconfidence on OOD.
We also extend the Lottery Ticket Hypothesis to open-world scenarios.
arXiv Detail & Related papers (2022-10-13T14:58:35Z) - A General Taylor Framework for Unifying and Revisiting Attribution
Methods [36.34893316038053]
We propose a Taylor attribution framework, which models the attribution problem as how to decide individual payoffs in a coalition.
We establish three principles for a good attribution in the Taylor attribution framework.
arXiv Detail & Related papers (2021-05-28T13:57:16Z) - Do Feature Attribution Methods Correctly Attribute Features? [5.58592454173439]
Feature attribution methods are exceedingly popular in interpretable machine learning.
There is no consensus on the definition of "attribution"
We evaluate three methods: saliency maps, rationales, and attention.
arXiv Detail & Related papers (2021-04-27T20:35:30Z) - Learning Causal Semantic Representation for Out-of-Distribution
Prediction [125.38836464226092]
We propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately.
We show that CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error.
arXiv Detail & Related papers (2020-11-03T13:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.