A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds
- URL: http://arxiv.org/abs/2110.09626v1
- Date: Mon, 18 Oct 2021 21:22:40 GMT
- Title: A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds
- Authors: Yan Shuo Tan, Abhineet Agarwal, Bin Yu
- Abstract summary: We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
- Score: 9.546094657606178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision trees are important both as interpretable models amenable to
high-stakes decision-making, and as building blocks of ensemble methods such as
random forests and gradient boosting. Their statistical properties, however,
are not well understood. The most cited prior works have focused on deriving
pointwise consistency guarantees for CART in a classical nonparametric
regression setting. We take a different approach, and advocate studying the
generalization performance of decision trees with respect to different
generative regression models. This allows us to elicit their inductive bias,
that is, the assumptions the algorithms make (or do not make) to generalize to
new data, thereby guiding practitioners on when and how to apply these methods.
In this paper, we focus on sparse additive generative models, which have both
low statistical complexity and some nonparametric flexibility. We prove a sharp
squared error generalization lower bound for a large class of decision tree
algorithms fitted to sparse additive models with $C^1$ component functions.
This bound is surprisingly much worse than the minimax rate for estimating such
sparse additive models. The inefficiency is due not to greediness, but to the
loss in power for detecting global structure when we average responses solely
over each leaf, an observation that suggests opportunities to improve
tree-based algorithms, for example, by hierarchical shrinkage. To prove these
bounds, we develop new technical machinery, establishing a novel connection
between decision tree estimation and rate-distortion theory, a sub-field of
information theory.
Related papers
- Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations [5.65604054654671]
We introduce the notion of an explainability-to-noise ratio for mixture models.
We propose an algorithm that takes as input a mixture model and constructs a suitable tree in data-independent time.
We prove upper and lower bounds on the error rate of the resulting decision tree.
arXiv Detail & Related papers (2024-11-03T14:00:20Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - A Robust Hypothesis Test for Tree Ensemble Pruning [2.4923006485141284]
We develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles.
We show that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss.
We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.
arXiv Detail & Related papers (2023-01-24T16:31:49Z) - SETAR-Tree: A Novel and Accurate Tree Algorithm for Global Time Series
Forecasting [7.206754802573034]
In this paper, we explore the close connections between TAR models and regression trees.
We introduce a new forecasting-specific tree algorithm that trains global Pooled Regression (PR) models in the leaves.
In our evaluation, the proposed tree and forest models are able to achieve significantly higher accuracy than a set of state-of-the-art tree-based algorithms.
arXiv Detail & Related papers (2022-11-16T04:30:42Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - GP-BART: a novel Bayesian additive regression trees approach using
Gaussian processes [1.03590082373586]
The GP-BART model is an extension of BART which addresses the limitation by assuming GP priors for the predictions of each terminal node among all trees.
The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modeling approaches in various scenarios.
arXiv Detail & Related papers (2022-04-05T11:18:44Z) - Data-driven advice for interpreting local and global model predictions
in bioinformatics problems [17.685881417954782]
Conditional feature contributions (CFCs) provide textitlocal, case-by-case explanations of a prediction.
We compare the explanations computed by both methods on a set of 164 publicly available classification problems.
For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores.
arXiv Detail & Related papers (2021-08-13T12:41:39Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - ENTMOOT: A Framework for Optimization over Ensemble Tree Models [57.98561336670884]
ENTMOOT is a framework for integrating tree models into larger optimization problems.
We show how ENTMOOT allows a simple integration of tree models into decision-making and black-box optimization.
arXiv Detail & Related papers (2020-03-10T14:34:07Z) - Polynomial-Time Exact MAP Inference on Discrete Models with Global
Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees.
We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.