Optimal Sparse Recovery with Decision Stumps
- URL: http://arxiv.org/abs/2303.04301v1
- Date: Wed, 8 Mar 2023 00:43:06 GMT
- Title: Optimal Sparse Recovery with Decision Stumps
- Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Max Springer
- Abstract summary: We show that tree based methods attain strong feature selection properties under a wide variety of settings.
As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features $s$ is unknown.
- Score: 7.24496247221802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision trees are widely used for their low computational cost, good
predictive performance, and ability to assess the importance of features.
Though often used in practice for feature selection, the theoretical guarantees
of these methods are not well understood. We here obtain a tight finite sample
bound for the feature selection problem in linear regression using single-depth
decision trees. We examine the statistical properties of these "decision
stumps" for the recovery of the $s$ active features from $p$ total features,
where $s \ll p$. Our analysis provides tight sample performance guarantees on
high-dimensional sparse systems which align with the finite sample bound of
$O(s \log p)$ as obtained by Lasso, improving upon previous bounds for both the
median and optimal splitting criteria. Our results extend to the non-linear
regime as well as arbitrary sub-Gaussian distributions, demonstrating that tree
based methods attain strong feature selection properties under a wide variety
of settings and further shedding light on the success of these methods in
practice. As a byproduct of our analysis, we show that we can provably
guarantee recovery even when the number of active features $s$ is unknown. We
further validate our theoretical results and proof methodology using
computational experiments.
Related papers
- Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts [0.0]
In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output.
This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions.
arXiv Detail & Related papers (2025-02-07T18:39:35Z) - Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning [11.304227281260896]
We introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of $Q$-values.
It provides robust uncertainty quantification and estimates lower confidence bounds (LCB) of $Q$-values.
We also emphasize on diversity within randomized value functions and enhance efficiency by introducing a diversity regularization method, reducing the requisite number of networks.
arXiv Detail & Related papers (2024-04-09T10:15:18Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - A Huber loss-based super learner with applications to healthcare
expenditures [0.0]
We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to downweight.
We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings.
arXiv Detail & Related papers (2022-05-13T19:57:50Z) - Sparsity-based Feature Selection for Anomalous Subgroup Discovery [5.960402015658508]
Anomalous pattern detection aims to identify instances where deviation from normalcy is evident, and is widely applicable across domains.
There is a common lack of a principled and scalable feature selection method for efficient discovery.
In this paper, we proposed a sparsity-based automated feature selection framework, which encodes systemic outcome deviations via the sparsity of feature-driven odds ratios.
arXiv Detail & Related papers (2022-01-06T10:56:43Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive Bayes [51.55826927508311]
We propose a sparse version of naive Bayes, which can be used for feature selection.
We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases.
Both binary and multinomial sparse models are solvable in time almost linear in problem size.
arXiv Detail & Related papers (2019-05-23T19:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.