Related papers: RO-FIGS: Efficient and Expressive Tree-Based Ensembles for Tabular Data

RO-FIGS: Efficient and Expressive Tree-Based Ensembles for Tabular Data

URL: http://arxiv.org/abs/2504.06927v1
Date: Wed, 09 Apr 2025 14:35:24 GMT
Title: RO-FIGS: Efficient and Expressive Tree-Based Ensembles for Tabular Data
Authors: Urška Matjašec, Nikola Simidjievski, Mateja Jamnik,
Abstract summary: Tree-based models are robust to uninformative features and can accurately capture non-smooth, complex decision boundaries.<n>We propose Random oblique Fast Interpretable Greedy-Tree Sums (RO-FIGS)<n>RO-FIGS builds on Fast Interpretable Greedy-Tree Sums, and extends it by learning trees with oblique or multivariate splits.<n>We evaluate RO-FIGS on 22 real-world datasets, demonstrating superior performance and much smaller models over other tree- and neural network-based methods.
Score: 10.610270769561811
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tree-based models are often robust to uninformative features and can accurately capture non-smooth, complex decision boundaries. Consequently, they often outperform neural network-based models on tabular datasets at a significantly lower computational cost. Nevertheless, the capability of traditional tree-based ensembles to express complex relationships efficiently is limited by using a single feature to make splits. To improve the efficiency and expressiveness of tree-based methods, we propose Random Oblique Fast Interpretable Greedy-Tree Sums (RO-FIGS). RO-FIGS builds on Fast Interpretable Greedy-Tree Sums, and extends it by learning trees with oblique or multivariate splits, where each split consists of a linear combination learnt from random subsets of features. This helps uncover interactions between features and improves performance. The proposed method is suitable for tabular datasets with both numerical and categorical features. We evaluate RO-FIGS on 22 real-world tabular datasets, demonstrating superior performance and much smaller models over other tree- and neural network-based methods. Additionally, we analyse their splits to reveal valuable insights into feature interactions, enriching the information learnt from SHAP summary plots, and thereby demonstrating the enhanced interpretability of RO-FIGS models. The proposed method is well-suited for applications, where balance between accuracy and interpretability is essential.

Related papers

Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data.<n>We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z)
Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection. By leveraging influence scores, we effectively identify the most impactful data for system improvement. We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z)
A Neural Network Alternative to Tree-based Models [0.0]
We show that our models, Sparse TABular NET or sTAB-Net with attention mechanisms, are more effective than tree-based models. They achieve better performance than post-hoc methods like SHAP.
arXiv Detail & Related papers (2024-10-23T10:50:07Z)
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules. We use decision trees to convey this reasoning information, as they can be easily represented in natural language. OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z)
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles [6.664930499708017]
The Shapley value (SV) is a concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. We present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions tree-based models.
arXiv Detail & Related papers (2024-01-22T16:08:41Z)
GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data [9.107782510356989]
We propose a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. Grande is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator. We demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.
arXiv Detail & Related papers (2023-09-29T10:49:14Z)
Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree [0.34530027457862006]
We develop an interpretable representation of a tree-ensemble model that can provide valuable insights into its behavior. The proposed model is effective in yielding a shallow interpretable tree approxing the tree-ensemble decision function.
arXiv Detail & Related papers (2023-02-15T10:43:31Z)
Ordinal Graph Gamma Belief Network for Social Recommender Systems [54.9487910312535]
We develop a hierarchical Bayesian model termed ordinal graph factor analysis (OGFA), which jointly models user-item and user-user interactions. OGFA not only achieves good recommendation performance, but also extracts interpretable latent factors corresponding to representative user preferences. We extend OGFA to ordinal graph gamma belief network, which is a multi-stochastic-layer deep probabilistic model.
arXiv Detail & Related papers (2022-09-12T09:19:22Z)
Deep Reinforcement Learning of Graph Matching [63.469961545293756]
Graph matching (GM) under node and pairwise constraints has been a building block in areas from optimization to computer vision. We present a reinforcement learning solver for GM i.e. RGM that seeks the node correspondence between pairwise graphs. Our method differs from the previous deep graph matching model in the sense that they are focused on the front-end feature extraction and affinity function learning.
arXiv Detail & Related papers (2020-12-16T13:48:48Z)
Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model. The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z)
FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features [2.00191482700544]
FREEtree is a tree-based method for high dimensional longitudinal data with correlated features. It exploits the network structure of the features by first clustering them using weighted correlation network analysis. It then conducts a screening step within each cluster of features and a selection step among the surviving features.
arXiv Detail & Related papers (2020-06-17T07:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.