Related papers: Feature Attribution from First Principles

Feature Attribution from First Principles

URL: http://arxiv.org/abs/2505.24729v1
Date: Fri, 30 May 2025 15:53:11 GMT
Title: Feature Attribution from First Principles
Authors: Magamed Taimeskhanov, Damien Garreau,
Abstract summary: We argue that axiomatic frameworks that any feature attribution method should satisfy are often too restrictive.<n>Rather than imposing axioms, we start by defining attributions for the simplest possible models.<n>We derive closed-form expressions for attribution of deep ReLU networks, and take a step toward the optimization of evaluation metrics.
Score: 6.836945436656676
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature attribution methods are a popular approach to explain the behavior of machine learning models. They assign importance scores to each input feature, quantifying their influence on the model's prediction. However, evaluating these methods empirically remains a significant challenge. To bypass this shortcoming, several prior works have proposed axiomatic frameworks that any feature attribution method should satisfy. In this work, we argue that such axioms are often too restrictive, and propose in response a new feature attribution framework, built from the ground up. Rather than imposing axioms, we start by defining attributions for the simplest possible models, i.e., indicator functions, and use these as building blocks for more complex models. We then show that one recovers several existing attribution methods, depending on the choice of atomic attribution. Subsequently, we derive closed-form expressions for attribution of deep ReLU networks, and take a step toward the optimization of evaluation metrics with respect to feature attributions.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think [81.38614558541772]
We introduce the CoT Encyclopedia, a framework for analyzing and steering model reasoning.<n>Our method automatically extracts diverse reasoning criteria from model-generated CoTs.<n>We show that this framework produces more interpretable and comprehensive analyses than existing methods.
arXiv Detail & Related papers (2025-05-15T11:31:02Z)
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation [54.50526986788175]
Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs. We present a unified view of these models, formulating such layers as implicit causal self-attention layers. Our framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods.
arXiv Detail & Related papers (2024-05-26T09:57:45Z)
When factorization meets argumentation: towards argumentative explanations [0.0]
We propose a novel model that combines factorization-based methods with argumentation frameworks (AFs) Our framework seamlessly incorporates side information, such as user contexts, leading to more accurate predictions.
arXiv Detail & Related papers (2024-05-13T19:16:28Z)
Context-aware feature attribution through argumentation [0.0]
We define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA) Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction.
arXiv Detail & Related papers (2023-10-24T20:02:02Z)
On the Robustness of Removal-Based Feature Attributions [17.679374058425346]
We theoretically characterize the properties of robustness of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and derive upper bounds for the difference between intact and perturbed attributions. Our results on synthetic and real-world data validate our theoretical results and demonstrate their practical implications.
arXiv Detail & Related papers (2023-06-12T23:33:13Z)
Impossibility Theorems for Feature Attribution [21.88229793890961]
We show that for moderately rich model classes, any feature attribution method can provably fail to improve on random guessing for inferring model behaviour. Our results apply to common end-tasks such as characterizing local model behaviour, identifying spurious features, and algorithmic recourse.
arXiv Detail & Related papers (2022-12-22T17:03:57Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples. We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
A Unified Taylor Framework for Revisiting Attribution Methods [49.03783992773811]
We propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework. We establish three principles for a good attribution in the Taylor attribution framework.
arXiv Detail & Related papers (2020-08-21T22:07:06Z)
Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis. We obtain new explanations that are loosely necessary and sufficient for a prediction. We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.