Optimal Sparse Recovery with Decision Stumps
- URL: http://arxiv.org/abs/2303.04301v1
- Date: Wed, 8 Mar 2023 00:43:06 GMT
- Title: Optimal Sparse Recovery with Decision Stumps
- Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Max Springer
- Abstract summary: We show that tree based methods attain strong feature selection properties under a wide variety of settings.
As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features $s$ is unknown.
- Score: 7.24496247221802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision trees are widely used for their low computational cost, good
predictive performance, and ability to assess the importance of features.
Though often used in practice for feature selection, the theoretical guarantees
of these methods are not well understood. We here obtain a tight finite sample
bound for the feature selection problem in linear regression using single-depth
decision trees. We examine the statistical properties of these "decision
stumps" for the recovery of the $s$ active features from $p$ total features,
where $s \ll p$. Our analysis provides tight sample performance guarantees on
high-dimensional sparse systems which align with the finite sample bound of
$O(s \log p)$ as obtained by Lasso, improving upon previous bounds for both the
median and optimal splitting criteria. Our results extend to the non-linear
regime as well as arbitrary sub-Gaussian distributions, demonstrating that tree
based methods attain strong feature selection properties under a wide variety
of settings and further shedding light on the success of these methods in
practice. As a byproduct of our analysis, we show that we can provably
guarantee recovery even when the number of active features $s$ is unknown. We
further validate our theoretical results and proof methodology using
computational experiments.
Related papers
- Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning [11.304227281260896]
We introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of $Q$-values.
It provides robust uncertainty quantification and estimates lower confidence bounds (LCB) of $Q$-values.
We also emphasize on diversity within randomized value functions and enhance efficiency by introducing a diversity regularization method, reducing the requisite number of networks.
arXiv Detail & Related papers (2024-04-09T10:15:18Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Neural Greedy Pursuit for Feature Selection [72.4121881681861]
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem.
We use neural networks as predictors in the algorithm to compute the loss.
arXiv Detail & Related papers (2022-07-19T16:39:16Z) - A Huber loss-based super learner with applications to healthcare
expenditures [0.0]
We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to downweight.
We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings.
arXiv Detail & Related papers (2022-05-13T19:57:50Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Sparsity-based Feature Selection for Anomalous Subgroup Discovery [5.960402015658508]
Anomalous pattern detection aims to identify instances where deviation from normalcy is evident, and is widely applicable across domains.
There is a common lack of a principled and scalable feature selection method for efficient discovery.
In this paper, we proposed a sparsity-based automated feature selection framework, which encodes systemic outcome deviations via the sparsity of feature-driven odds ratios.
arXiv Detail & Related papers (2022-01-06T10:56:43Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.