Estimation and Inference with Trees and Forests in High Dimensions
- URL: http://arxiv.org/abs/2007.03210v2
- Date: Wed, 21 Oct 2020 18:44:09 GMT
- Title: Estimation and Inference with Trees and Forests in High Dimensions
- Authors: Vasilis Syrgkanis and Manolis Zampetakis
- Abstract summary: shallow trees built greedily via the CART empirical MSE criterion achieve MSE rates that depend only logarithmically on the ambient dimension $d$.
For strongly relevant features, we also show that fully grown forests achieve fast MSE rates and their predictions are also honestally normal.
- Score: 23.732259124656903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze the finite sample mean squared error (MSE) performance of
regression trees and forests in the high dimensional regime with binary
features, under a sparsity constraint. We prove that if only $r$ of the $d$
features are relevant for the mean outcome function, then shallow trees built
greedily via the CART empirical MSE criterion achieve MSE rates that depend
only logarithmically on the ambient dimension $d$. We prove upper bounds, whose
exact dependence on the number relevant variables $r$ depends on the
correlation among the features and on the degree of relevance. For strongly
relevant features, we also show that fully grown honest forests achieve fast
MSE rates and their predictions are also asymptotically normal, enabling
asymptotically valid inference that adapts to the sparsity of the regression
function.
Related papers
- Accurate estimation of feature importance faithfulness for tree models [3.545940115969205]
We consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared.
We propose a method of ranking features by their importance for the tree model's predictions based on PGI squared.
arXiv Detail & Related papers (2024-04-04T13:09:26Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Monotonicity and Double Descent in Uncertainty Estimation with Gaussian
Processes [52.92110730286403]
It is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions.
We prove that by tuning hyper parameters, the performance, as measured by the marginal likelihood, improves monotonically with the input dimension.
We also prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.
arXiv Detail & Related papers (2022-10-14T08:09:33Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - TreeFlow: Going beyond Tree-based Gaussian Probabilistic Regression [0.0]
We introduce TreeFlow, the tree-based approach that combines the benefits of using tree ensembles with the capabilities of modeling flexible probability distributions.
We evaluate the proposed method on challenging regression benchmarks with varying volume, feature characteristics, and target dimensionality.
arXiv Detail & Related papers (2022-06-08T20:06:23Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium
Learning from Offline Datasets [101.5329678997916]
We study episodic two-player zero-sum Markov games (MGs) in the offline setting.
The goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori.
arXiv Detail & Related papers (2022-02-15T15:39:30Z) - Data-driven advice for interpreting local and global model predictions
in bioinformatics problems [17.685881417954782]
Conditional feature contributions (CFCs) provide textitlocal, case-by-case explanations of a prediction.
We compare the explanations computed by both methods on a set of 164 publicly available classification problems.
For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores.
arXiv Detail & Related papers (2021-08-13T12:41:39Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Large Scale Prediction with Decision Trees [9.917147243076645]
This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks.
A key step in the analysis is the establishment of an oracle inequality, which allows for a precise characterization of the goodness-of-fit and complexity tradeoff for a mis-specified model.
arXiv Detail & Related papers (2021-04-28T16:59:03Z) - Uncovering Feature Interdependencies in High-Noise Environments with
Stepwise Lookahead Decision Forests [0.0]
"Stepwise lookahead" variation of random forest algorithm is presented for its ability to better uncover binary feature interdependencies.
A long-short trading strategy for copper futures is then backtested by training both greedy and lookahead random forests to predict the signs of daily price returns.
arXiv Detail & Related papers (2020-09-30T11:31:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.