The Weighted M\"obius Score: A Unified Framework for Feature Attribution
- URL: http://arxiv.org/abs/2305.09204v1
- Date: Tue, 16 May 2023 06:27:27 GMT
- Title: The Weighted M\"obius Score: A Unified Framework for Feature Attribution
- Authors: Yifan Jiang, Shane Steinert-Threlkeld
- Abstract summary: Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction.
The lack of a unified framework has led to a proliferation of methods that are often not directly comparable.
This paper introduces a parameterized attribution framework -- the Weighted M"obius Score -- and shows that many different attribution methods for both individual features and feature interactions are special cases.
- Score: 17.358276581599643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature attribution aims to explain the reasoning behind a black-box model's
prediction by identifying the impact of each feature on the prediction. Recent
work has extended feature attribution to interactions between multiple
features. However, the lack of a unified framework has led to a proliferation
of methods that are often not directly comparable. This paper introduces a
parameterized attribution framework -- the Weighted M\"obius Score -- and (i)
shows that many different attribution methods for both individual features and
feature interactions are special cases and (ii) identifies some new methods. By
studying the vector space of attribution methods, our framework utilizes
standard linear algebra tools and provides interpretations in various fields,
including cooperative game theory and causal mediation analysis. We empirically
demonstrate the framework's versatility and effectiveness by applying these
attribution methods to feature interactions in sentiment analysis and
chain-of-thought prompting.
Related papers
- Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - Relational Local Explanations [11.679389861042]
We develop a novel model-agnostic and permutation-based feature attribution algorithm based on relational analysis between input variables.
We are able to gain a broader insight into machine learning model decisions and data.
arXiv Detail & Related papers (2022-12-23T14:46:23Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Concurrent Discrimination and Alignment for Self-Supervised Feature
Learning [52.213140525321165]
Existing self-supervised learning methods learn by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together.
In this work, we combine the positive aspects of the discriminating and aligning methods, and design a hybrid method that addresses the above issue.
Our method explicitly specifies the repulsion and attraction mechanism respectively by discriminative predictive task and concurrently maximizing mutual information between paired views.
Our experiments on nine established benchmarks show that the proposed model consistently outperforms the existing state-of-the-art results of self-supervised and transfer
arXiv Detail & Related papers (2021-08-19T09:07:41Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Towards Unifying Feature Attribution and Counterfactual Explanations:
Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples.
We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z) - A Unified Taylor Framework for Revisiting Attribution Methods [49.03783992773811]
We propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework.
We establish three principles for a good attribution in the Taylor attribution framework.
arXiv Detail & Related papers (2020-08-21T22:07:06Z) - How does this interaction affect me? Interpretable attribution for
feature interactions [19.979889568380464]
We propose an interaction attribution and detection framework called Archipelago.
Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods.
We also provide accompanying visualizations of our approach that give new insights into deep neural networks.
arXiv Detail & Related papers (2020-06-19T05:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.