A General Taylor Framework for Unifying and Revisiting Attribution
Methods
- URL: http://arxiv.org/abs/2105.13841v1
- Date: Fri, 28 May 2021 13:57:16 GMT
- Title: A General Taylor Framework for Unifying and Revisiting Attribution
Methods
- Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Xia Hu
- Abstract summary: We propose a Taylor attribution framework, which models the attribution problem as how to decide individual payoffs in a coalition.
We establish three principles for a good attribution in the Taylor attribution framework.
- Score: 36.34893316038053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attribution methods provide an insight into the decision-making process of
machine learning models, especially deep neural networks, by assigning
contribution scores to each individual feature. However, the attribution
problem has not been well-defined, which lacks a unified guideline to the
contribution assignment process. Furthermore, existing attribution methods
often built upon various empirical intuitions and heuristics. There still lacks
a general theoretical framework that not only can offer a good description of
the attribution problem, but also can be applied to unifying and revisiting
existing attribution methods. To bridge the gap, in this paper, we propose a
Taylor attribution framework, which models the attribution problem as how to
decide individual payoffs in a coalition. Then, we reformulate fourteen
mainstream attribution methods into the Taylor framework and analyze these
attribution methods in terms of rationale, fidelity, and limitation in the
framework. Moreover, we establish three principles for a good attribution in
the Taylor attribution framework, i.e., low approximation error, correct Taylor
contribution assignment, and unbiased baseline selection. Finally, we
empirically validate the Taylor reformulations and reveal a positive
correlation between the attribution performance and the number of principles
followed by the attribution method via benchmarking on real-world datasets.
Related papers
- On the Evaluation Consistency of Attribution-based Explanations [42.1421504321572]
We introduce Meta-Rank, an open platform for benchmarking attribution methods in the image domain.
Our benchmark reveals three insights in attribution evaluation endeavors: 1) evaluating attribution methods under disparate settings can yield divergent performance rankings; 2) although inconsistent across numerous cases, the performance rankings exhibit remarkable consistency across distinct checkpoints along the same training trajectory; and 3) prior attempts at consistent evaluation fare no better than baselines when extended to more heterogeneous models and datasets.
arXiv Detail & Related papers (2024-07-28T11:49:06Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods [49.62131719441252]
Attribution methods compute importance scores for input features to explain the output predictions of deep models.
In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill.
We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
arXiv Detail & Related papers (2024-05-02T13:48:37Z) - Understanding and Unifying Fourteen Attribution Methods with Taylor
Interactions [34.94946455284657]
Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output.
There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related.
We prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects.
arXiv Detail & Related papers (2023-03-02T04:50:05Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - A Unified Taylor Framework for Revisiting Attribution Methods [49.03783992773811]
We propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework.
We establish three principles for a good attribution in the Taylor attribution framework.
arXiv Detail & Related papers (2020-08-21T22:07:06Z) - Beyond $\mathcal{H}$-Divergence: Domain Adaptation Theory With
Jensen-Shannon Divergence [21.295136514836788]
We reveal the incoherence between the widely-adopted empirical domain adversarial training and its generally-assumed theoretical counterpart based on $mathcalH$-divergence.
We establish a new theoretical framework by directly proving the upper and lower target risk bounds based on joint distributional Jensen-Shannon divergence.
arXiv Detail & Related papers (2020-07-30T16:19:59Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.