Related papers: Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm

URL: http://arxiv.org/abs/2510.24815v1
Date: Tue, 28 Oct 2025 09:49:01 GMT
Title: Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm
Authors: Clément Bénard,
Abstract summary: We introduce the TreeHFD algorithm to estimate the Hoeffding decomposition of a tree ensemble from a data sample.<n>The high performance of TreeHFD is demonstrated through experiments on both simulated and real data.
Score: 2.242085086643166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a strong limitation, especially for applications with critical decisions at stake. The Hoeffding or ANOVA functional decomposition is a powerful explainability method, as it breaks down black-box models into a unique sum of lower-dimensional functions, provided that input variables are independent. In standard learning settings, input variables are often dependent, and the Hoeffding decomposition is generalized through hierarchical orthogonality constraints. Such generalization leads to unique and sparse decompositions with well-defined main effects and interactions. However, the practical estimation of this decomposition from a data sample is still an open problem. Therefore, we introduce the TreeHFD algorithm to estimate the Hoeffding decomposition of a tree ensemble from a data sample. We show the convergence of TreeHFD, along with the main properties of orthogonality, sparsity, and causal variable selection. The high performance of TreeHFD is demonstrated through experiments on both simulated and real data, using our treehfd Python package (https://github.com/ThalesGroup/treehfd). Besides, we empirically show that the widely used TreeSHAP method, based on Shapley values, is strongly connected to the Hoeffding decomposition.

Related papers

Entropy-Tree: Tree-Based Decoding with Entropy-Guided Exploration [52.52685988964061]
Entropy-Tree is a tree-based decoding method that exploits entropy as a signal for branching decisions.<n>It unifies efficient structured exploration and reliable uncertainty estimation within a single decoding procedure.
arXiv Detail & Related papers (2026-01-02T07:14:05Z)
Hierarchical Quantized Diffusion Based Tree Generation Method for Hierarchical Representation and Lineage Analysis [49.00783841494125]
HDTree captures tree relationships within a hierarchical latent space using a unified hierarchical codebook and quantized diffusion processes.<n> HDTree's effectiveness is demonstrated through comparisons on both general-purpose and single-cell datasets.<n>These contributions provide a new tool for hierarchical lineage analysis, enabling more accurate and efficient modeling of cellular differentiation paths.
arXiv Detail & Related papers (2025-06-29T15:19:13Z)
Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data.<n>We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z)
Explainable Clustering Beyond Worst-Case Guarantees [5.65604054654671]
We study the explainable clustering problem first posed by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020)<n>The goal of explainable clustering is to fit an axis-aligned decision tree with $K$ leaves and minimal clustering cost (where every leaf is a cluster)
arXiv Detail & Related papers (2024-11-03T14:00:20Z)
A Unified Approach to Extract Interpretable Rules from Tree Ensembles via Integer Programming [2.1408617023874443]
Tree ensembles are very popular machine learning models, known for their effectiveness in supervised classification and regression tasks.<n>Our work aims to extract an optimized list of rules from a trained tree ensemble, providing the user with a condensed, interpretable model that retains most of the predictive power of the full model.<n>Our extensive computational experiments offer statistically significant evidence that our method is competitive with other rule extraction methods in terms of predictive performance and fidelity towards the tree ensemble.
arXiv Detail & Related papers (2024-06-30T22:33:47Z)
Policy Gradient with Tree Expansion [72.10002936187388]
Policy gradient methods are notorious for having a large variance and high sample complexity.<n>We introduce SoftTreeMax -- a generalization of softmax that employs planning.<n>We show that SoftTreeMax reduces the gradient variance by three orders of magnitude.
arXiv Detail & Related papers (2023-01-30T19:03:14Z)
Unifying local and global model explanations by functional decomposition of low dimensional structures [0.0]
We consider a global explanation of a regression or classification function by decomposing it into the sum of main components and interaction components. Here, q denotes the highest order of interaction present in the decomposition.
arXiv Detail & Related papers (2022-08-12T07:38:53Z)
Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure. HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z)
Partial Counterfactual Identification from Observational and Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data. Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z)
Data-driven advice for interpreting local and global model predictions in bioinformatics problems [17.685881417954782]
Conditional feature contributions (CFCs) provide textitlocal, case-by-case explanations of a prediction. We compare the explanations computed by both methods on a set of 164 publicly available classification problems. For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores.
arXiv Detail & Related papers (2021-08-13T12:41:39Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z)
FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features [2.00191482700544]
FREEtree is a tree-based method for high dimensional longitudinal data with correlated features. It exploits the network structure of the features by first clustering them using weighted correlation network analysis. It then conducts a screening step within each cluster of features and a selection step among the surviving features.
arXiv Detail & Related papers (2020-06-17T07:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.