Related papers: Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability

Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability

URL: http://arxiv.org/abs/2512.03112v1
Date: Tue, 02 Dec 2025 08:34:43 GMT
Title: Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability
Authors: Jialai She,
Abstract summary: We introduce Sparse Isotonic Shapley Regression (SISR), a unified nonlinear explanation framework.<n>SISR learns a monotonic transformation to restore additivity--obviating the need for a closed-form specification--and enforces an L0 sparsity constraint on the Shapley vector.<n>SISR stabilizes attributions across payoff schemes, correctly filters irrelevant features while standard Shapley values suffer severe rank and sign distortions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Shapley values, a gold standard for feature attribution in Explainable AI, face two primary challenges. First, the canonical Shapley framework assumes that the worth function is additive, yet real-world payoff constructions--driven by non-Gaussian distributions, heavy tails, feature dependence, or domain-specific loss scales--often violate this assumption, leading to distorted attributions. Secondly, achieving sparse explanations in high dimensions by computing dense Shapley values and then applying ad hoc thresholding is prohibitively costly and risks inconsistency. We introduce Sparse Isotonic Shapley Regression (SISR), a unified nonlinear explanation framework. SISR simultaneously learns a monotonic transformation to restore additivity--obviating the need for a closed-form specification--and enforces an L0 sparsity constraint on the Shapley vector, enhancing computational efficiency in large feature spaces. Its optimization algorithm leverages Pool-Adjacent-Violators for efficient isotonic regression and normalized hard-thresholding for support selection, yielding implementation ease and global convergence guarantees. Analysis shows that SISR recovers the true transformation in a wide range of scenarios and achieves strong support recovery even in high noise. Moreover, we are the first to demonstrate that irrelevant features and inter-feature dependencies can induce a true payoff transformation that deviates substantially from linearity. Experiments in regression, logistic regression, and tree ensembles demonstrate that SISR stabilizes attributions across payoff schemes, correctly filters irrelevant features while standard Shapley values suffer severe rank and sign distortions. By unifying nonlinear transformation estimation with sparsity pursuit, SISR advances the frontier of nonlinear explainability, providing a theoretically grounded and practical attribution framework.

Related papers

Renet: Principled and Efficient Relaxation for the Elastic Net via Dynamic Objective Selection [0.0]
We introduce Renet, a principled generalization of the Relaxed Lasso to the Elastic Net family of estimators.<n>We show that Renet consistently outperforms the standard Elastic Net in high-dimensional, low signal-to-noise ratio and high-multicolline regimes.
arXiv Detail & Related papers (2026-02-11T18:22:59Z)
MirrorLA: Reflecting Feature Map for Vision Linear Attention [49.41670925034762]
Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance.<n>We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation.<n>MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
arXiv Detail & Related papers (2026-02-04T09:14:09Z)
On the Computational Complexity of Performative Prediction [59.584063949469645]
We show that computing an $$-performatively stable point is PPAD-complete.<n>One of our key technical contributions is to extend this PPAD-hardness result to general convex domains.
arXiv Detail & Related papers (2026-01-28T02:26:35Z)
Semi-parametric Functional Classification via Path Signatures Logistic Regression [1.210026603224224]
We propose Path Signatures Logistic Regression, a semi-parametric framework for classifying vector-valued functional data.<n>Our results highlight the practical and theoretical benefits of integrating rough path theory into modern functional data analysis.
arXiv Detail & Related papers (2025-07-09T08:06:50Z)
An Accelerated Alternating Partial Bregman Algorithm for ReLU-based Matrix Decomposition [0.0]
In this paper, we aim to investigate the sparse low-rank characteristics rectified on non-negative matrices.<n>We propose a novel regularization term incorporating useful structures in clustering and compression tasks.<n>We derive corresponding closed-form solutions while maintaining the $L$-smooth property always holds for any $Lge 1$.
arXiv Detail & Related papers (2025-03-04T08:20:34Z)
Sample Complexity of Linear Quadratic Regulator Without Initial Stability [7.245261469258501]
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown dynamics.<n>Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity.
arXiv Detail & Related papers (2025-02-20T02:44:25Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Score-based Causal Representation Learning with Interventions [54.735484409244386]
This paper studies the causal representation learning problem when latent causal variables are observed indirectly. The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables.
arXiv Detail & Related papers (2023-01-19T18:39:48Z)
Towards Understanding Generalization via Decomposing Excess Risk Dynamics [13.4379473119565]
We analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Inspired by the observation that neural networks show a slow convergence rate when fitting noise, we propose decomposing the excess risk dynamics. Under the decomposition framework, the new bound accords better with the theoretical and empirical evidence compared to the stability-based bound and uniform convergence bound.
arXiv Detail & Related papers (2021-06-11T03:42:45Z)
LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z)
On the Stability Properties and the Optimization Landscape of Training Problems with Squared Loss for Neural Networks and General Nonlinear Conic Approximation Schemes [0.0]
We study the optimization landscape and the stability properties of training problems with squared loss for neural networks and general nonlinear conic approximation schemes. We prove that the same effects that are responsible for these instability properties are also the reason for the emergence of saddle points and spurious local minima.
arXiv Detail & Related papers (2020-11-06T11:34:59Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.