Related papers: OrdShap: Feature Position Importance for Sequential Black-Box Models

OrdShap: Feature Position Importance for Sequential Black-Box Models

URL: http://arxiv.org/abs/2507.11855v1
Date: Wed, 16 Jul 2025 02:40:01 GMT
Title: OrdShap: Feature Position Importance for Sequential Black-Box Models
Authors: Davin Hill, Brian L. Hill, Aria Masoomi, Vijay S. Nori, Robert E. Tillman, Jennifer Dy,
Abstract summary: We introduce OrdShap, a novel attribution method that disentangles effects by quantifying how a model's predictions change in response to permuting feature position.<n> Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing value and feature position attributions.
Score: 3.4057190746821586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering - conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model's predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Berganti\~nos values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Improving Neural Additive Models with Bayesian Principles [54.29602161803093]
Neural additive models (NAMs) enhance the transparency of deep neural networks by handling calibrated input features in separate additive sub-networks. We develop Laplace-approximated NAMs (LA-NAMs) which show improved empirical performance on datasets and challenging real-world medical tasks.
arXiv Detail & Related papers (2023-05-26T13:19:15Z)
Exploring the cloud of feature interaction scores in a Rashomon set [17.775145325515993]
We introduce the feature interaction score (FIS) in the context of a Rashomon set. We demonstrate the properties of the FIS via synthetic data and draw connections to other areas of statistics. Our results suggest that the proposed FIS can provide valuable insights into the nature of feature interactions in machine learning models.
arXiv Detail & Related papers (2023-05-17T13:05:26Z)
Asymmetric feature interaction for interpreting model predictions [13.934784414106087]
In natural language processing, deep neural networks (DNNs) could model complex interactions between context. We propose an asymmetric feature interaction attribution model that aims to explore asymmetric higher-order feature interactions. Experimental results on two sentiment classification datasets show the superiority of our model against the state-of-the-art feature interaction attribution methods.
arXiv Detail & Related papers (2023-05-12T03:31:24Z)
Flexible Networks for Learning Physical Dynamics of Deformable Objects [2.567499374977917]
We propose a model named time-wise PointNet (TP-Net) to infer the future state of a deformable object with particle-based representation. TP-Net consists of a shared feature extractor that extracts global features from each input point set in parallel and a prediction network that aggregates and reasons on these features for future prediction. Experiments demonstrate that our model achieves state-of-the-art performance in both synthetic dataset and in real-world dataset, with real-time prediction speed.
arXiv Detail & Related papers (2021-12-07T14:34:52Z)
Counterfactual Shapley Additive Explanations [6.916452769334367]
We propose a variant of SHAP, CoSHAP, that uses counterfactual generation techniques to produce a background dataset. We motivate the need within the actionable recourse setting for careful consideration of background datasets when using Shapley values for feature attributions.
arXiv Detail & Related papers (2021-10-27T08:44:53Z)
You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance. It remains unclear which features such black-box models actually learn to use for making predictions. This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z)
Joint Shapley values: a measure of joint feature importance [6.169364905804678]
We introduce joint Shapley values, which directly extend the Shapley axioms. Joint Shapley values measure a set of features' average effect on a model's prediction. Results for games show that joint Shapley values present different insights from existing interaction indices.
arXiv Detail & Related papers (2021-07-23T17:22:37Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
Explaining and Improving Model Behavior with k Nearest Neighbor Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions. We show that kNN representations are effective at uncovering learned spurious associations. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner. We study the entropy, or uncertainty, of the model's token-level predictions. We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z)
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples. We conduct a comparison between influence functions and common word-saliency methods on representative tasks. We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.