Feature-Specific Coefficients of Determination in Tree Ensembles
- URL: http://arxiv.org/abs/2407.03515v1
- Date: Wed, 3 Jul 2024 21:27:29 GMT
- Title: Feature-Specific Coefficients of Determination in Tree Ensembles
- Authors: Zhongli Jiang, Dabao Zhang, Min Zhang,
- Abstract summary: Tree ensemble methods provide promising predictions with models difficult to interpret.
Recent introduction of Shapley values for individualized feature contributions, accompanied with several fast computing algorithms for predicted values, shows intriguing results.
We propose an efficient algorithm, Q-SHAP, that reduces the computational complexity to time when calculating Shapley values related to quadratic losses.
- Score: 10.968795392216606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tree ensemble methods provide promising predictions with models difficult to interpret. Recent introduction of Shapley values for individualized feature contributions, accompanied with several fast computing algorithms for predicted values, shows intriguing results. However, individualizing coefficients of determination, aka $R^2$, for each feature is challenged by the underlying quadratic losses, although these coefficients allow us to comparatively assess single feature's contribution to tree ensembles. Here we propose an efficient algorithm, Q-SHAP, that reduces the computational complexity to polynomial time when calculating Shapley values related to quadratic losses. Our extensive simulation studies demonstrate that this approach not only enhances computational efficiency but also improves estimation accuracy of feature-specific coefficients of determination.
Related papers
- Optimal Decision Tree Pruning Revisited: Algorithms and Complexity [22.57063332430197]
We focus on fundamental pruning operations of subtree replacement and raising.<n>While optimal pruning can be performed in time for subtree replacement, the problem is NP-complete for subtree raising.<n>For example, while subtree raising is hard for small domain size, it can be solved in $D2d cdot |I|O(1)$ time, where $|I|$ is the input size.
arXiv Detail & Related papers (2025-03-05T15:02:46Z) - When is the Computation of a Feature Attribution Method Tractable? [0.0]
We study the computational complexity of power indices beyond SHAP.
We show that power indices can be simplified to a constant number of expected value evaluations.
We also explore interaction indices that quantify the importance of feature subsets.
arXiv Detail & Related papers (2025-01-04T18:37:02Z) - Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores.
We flexibly estimate the joint cumulative distribution function (CDF) of the scores.
Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Improving the Weighting Strategy in KernelSHAP [0.8057006406834466]
In Explainable AI (XAI) Shapley values are a popular framework for explaining predictions made by complex machine learning models.<n>We propose a novel modification of KernelSHAP which replaces the deterministic weights with ones to reduce the variance of the resulting Shapley value approximations.<n>Our methods can reduce the required number of contribution function evaluations by $5%$ to $50%$ while preserving the same accuracy of the approximated Shapley values.
arXiv Detail & Related papers (2024-10-07T10:02:31Z) - Statistical Advantages of Oblique Randomized Decision Trees and Forests [0.0]
Generalization error and convergence rates are obtained for the flexible dimension reduction model class of ridge functions.
A lower bound on the risk of axis-aligned Mondrian trees is obtained proving that these estimators are suboptimal for these linear dimension reduction models.
arXiv Detail & Related papers (2024-07-02T17:35:22Z) - Energy-Based Model for Accurate Estimation of Shapley Values in Feature Attribution [7.378438977893025]
EmSHAP (Energy-based model for Shapley value estimation) is proposed to estimate the expectation of Shapley contribution function.<n>GRU (Gated Recurrent Unit)-coupled partition function estimation method is introduced.
arXiv Detail & Related papers (2024-04-01T12:19:33Z) - Fast Shapley Value Estimation: A Unified Approach [71.92014859992263]
We propose a straightforward and efficient Shapley estimator, SimSHAP, by eliminating redundant techniques.
In our analysis of existing approaches, we observe that estimators can be unified as a linear transformation of randomly summed values from feature subsets.
Our experiments validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
arXiv Detail & Related papers (2023-11-02T06:09:24Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - PDD-SHAP: Fast Approximations for Shapley Values using Functional
Decomposition [2.0559497209595823]
We propose PDD-SHAP, an algorithm that uses an ANOVA-based functional decomposition model to approximate the black-box model being explained.
This allows us to calculate Shapley values orders of magnitude faster than existing methods for large datasets, significantly reducing the amortized cost of computing Shapley values.
arXiv Detail & Related papers (2022-08-26T11:49:54Z) - Shapley-NAS: Discovering Operation Contribution for Neural Architecture
Search [96.20505710087392]
We propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search.
We show that our method outperforms the state-of-the-art methods by a considerable margin with light search cost.
arXiv Detail & Related papers (2022-06-20T14:41:49Z) - Accelerating Shapley Explanation via Contributive Cooperator Selection [42.11059072201565]
We propose a novel method SHEAR to significantly accelerate the Shapley explanation for DNN models.
The selection of the feature coalitions follows our proposed Shapley chain rule to minimize the absolute error from the ground-truth Shapley values.
SHEAR consistently outperforms state-of-the-art baseline methods across different evaluation metrics.
arXiv Detail & Related papers (2022-06-17T03:24:45Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Expectation propagation on the diluted Bayesian classifier [0.0]
We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification.
A computational scheme known as expectation propagation (EP) is used to train a continuous-weights perceptron learning a classification rule.
EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity.
arXiv Detail & Related papers (2020-09-20T23:59:44Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z) - Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR)
CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation.
We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Multivariate Boosted Trees and Applications to Forecasting and Control [0.0]
Gradient boosted trees are non-parametric regressors that exploit sequential model fitting and gradient descent to minimize a specific loss function.
In this paper, we present a computationally efficient algorithm for fitting multivariate boosted trees.
arXiv Detail & Related papers (2020-03-08T19:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.