Nonparametric Variable Screening with Optimal Decision Stumps
- URL: http://arxiv.org/abs/2011.02683v2
- Date: Fri, 11 Dec 2020 00:38:18 GMT
- Title: Nonparametric Variable Screening with Optimal Decision Stumps
- Authors: Jason M. Klusowski and Peter M. Tian
- Abstract summary: We derive finite sample performance guarantees for variable selection in nonparametric models using a single-level CART decision tree.
Unlike previous marginal screening methods that attempt to directly estimate each marginal projection via a truncated basis expansion, the fitted model used here is a simple, parsimonious decision stump.
- Score: 19.493449206135296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision trees and their ensembles are endowed with a rich set of diagnostic
tools for ranking and screening variables in a predictive model. Despite the
widespread use of tree based variable importance measures, pinning down their
theoretical properties has been challenging and therefore largely unexplored.
To address this gap between theory and practice, we derive finite sample
performance guarantees for variable selection in nonparametric models using a
single-level CART decision tree (a decision stump). Under standard operating
assumptions in variable screening literature, we find that the marginal signal
strength of each variable and ambient dimensionality can be considerably weaker
and higher, respectively, than state-of-the-art nonparametric variable
selection methods. Furthermore, unlike previous marginal screening methods that
attempt to directly estimate each marginal projection via a truncated basis
expansion, the fitted model used here is a simple, parsimonious decision stump,
thereby eliminating the need for tuning the number of basis terms. Thus,
surprisingly, even though decision stumps are highly inaccurate for estimation
purposes, they can still be used to perform consistent model selection.
Related papers
- Bayesian Model Selection via Mean-Field Variational Approximation [10.433170683584994]
We study the non-asymptotic properties of mean-field (MF) inference under the Bayesian framework.
We show a Bernstein von-Mises (BvM) theorem for the variational distribution from MF under possible model mis-specification.
arXiv Detail & Related papers (2023-12-17T04:48:25Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Environment Invariant Linear Least Squares [18.387614531869826]
This paper considers a multi-environment linear regression model in which data from multiple experimental settings are collected.
We construct a novel environment invariant linear least squares (EILLS) objective function, a multi-environment version of linear least-squares regression.
arXiv Detail & Related papers (2023-03-06T13:10:54Z) - Variational Boosted Soft Trees [13.956254007901675]
Gradient boosting machines (GBMs) based on decision trees consistently demonstrate state-of-the-art results on regression and classification tasks.
We propose to implement Bayesian GBMs using variational inference with soft decision trees.
arXiv Detail & Related papers (2023-02-21T14:51:08Z) - Predicting Out-of-Domain Generalization with Neighborhood Invariance [59.05399533508682]
We propose a measure of a classifier's output invariance in a local transformation neighborhood.
Our measure is simple to calculate, does not depend on the test point's true label, and can be applied even in out-of-domain (OOD) settings.
In experiments on benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our measure and actual OOD generalization.
arXiv Detail & Related papers (2022-07-05T14:55:16Z) - Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian
Nonparametrics [85.31247588089686]
We show that variational Bayesian methods can yield sensitivities with respect to parametric and nonparametric aspects of Bayesian models.
We provide both theoretical and empirical support for our variational approach to Bayesian sensitivity analysis.
arXiv Detail & Related papers (2021-07-08T03:40:18Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.