Related papers: TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

URL: http://arxiv.org/abs/2602.11623v1
Date: Thu, 12 Feb 2026 06:17:12 GMT
Title: TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
Authors: Weida Li, Yaoliang Yu, Bryan Kian Hsiang Low,
Abstract summary: probabilistic values are used to rank features for explaining local predicted values of decision trees.<n>TreeGrad computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves.<n>TreeGrad-Ranker aggregates the gradients while optimizing the joint objective to produce feature rankings.<n>TreeGrad-Shap is a numerically stable algorithm for computing Beta Shapley values with integral parameters.
Score: 73.0940890296463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality of feature rankings is typically assessed with the insertion and deletion metrics. Empirically, we observe that co-optimizing these two metrics is closely related to a joint optimization that selects a subset of features to maximize the local predicted value while minimizing it for the complement. However, we theoretically show that probabilistic values are generally unreliable for solving this joint optimization. Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves; these gradients include weighted Banzhaf values. Building upon TreeGrad, we introduce TreeGrad-Ranker, which aggregates the gradients while optimizing the joint objective to produce feature rankings, and TreeGrad-Shap, a numerically stable algorithm for computing Beta Shapley values with integral parameters. In particular, the feature scores computed by TreeGrad-Ranker satisfy all the axioms uniquely characterizing probabilistic values, except for linearity, which itself leads to the established unreliability. Empirically, we demonstrate that the numerical error of Linear TreeShap can be up to $10^{15}$ times larger than that of TreeGrad-Shap when computing the Shapley value. As a by-product, we also develop TreeProb, which generalizes Linear TreeShap to support all probabilistic values. In our experiments, TreeGrad-Ranker performs significantly better on both insertion and deletion metrics. Our code is available at https://github.com/watml/TreeGrad.

Related papers

Experiments with Optimal Model Trees [1.3750624267664155]
We show that globally optimal model trees can achieve competitive accuracy with very small trees.<n>We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines.
arXiv Detail & Related papers (2025-03-17T08:03:47Z)
Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees. We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z)
End-to-end Feature Selection Approach for Learning Skinny Trees [13.388576838688202]
We propose a new optimization-based approach for feature selection in tree ensembles.<n>Skinny Trees is an end-to-end toolkit for feature selection in tree ensembles.
arXiv Detail & Related papers (2023-10-28T00:15:10Z)
On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z)
On marginal feature attributions of tree-based models [0.11184789007828977]
Local feature attributions based on marginal expectations, e.g. marginal Shapley, Owen or Banzhaf values, may be employed. We present two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features. We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models.
arXiv Detail & Related papers (2023-02-16T17:18:03Z)
Policy Gradient with Tree Expansion [72.10002936187388]
Policy gradient methods are notorious for having a large variance and high sample complexity.<n>We introduce SoftTreeMax -- a generalization of softmax that employs planning.<n>We show that SoftTreeMax reduces the gradient variance by three orders of magnitude.
arXiv Detail & Related papers (2023-01-30T19:03:14Z)
Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization [0.0]
We propose Prediction Decomposition (PreDecomp), a novel individualized feature attribution for boosted trees when they are trained with $ell$ regularization. We also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree.
arXiv Detail & Related papers (2022-11-08T17:56:22Z)
SoftTreeMax: Policy Gradient with Tree Search [72.9513807133171]
We introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient. On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.
arXiv Detail & Related papers (2022-09-28T09:55:47Z)
Linear TreeShap [16.246232737115218]
Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. This paper presents a more efficient and straightforward algorithm: Linear TreeShap.
arXiv Detail & Related papers (2022-09-16T23:17:15Z)
Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.