Related papers: Optimal Sparse Recovery with Decision Stumps

Optimal Sparse Recovery with Decision Stumps

URL: http://arxiv.org/abs/2303.04301v1
Date: Wed, 8 Mar 2023 00:43:06 GMT
Title: Optimal Sparse Recovery with Decision Stumps
Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Max Springer
Abstract summary: We show that tree based methods attain strong feature selection properties under a wide variety of settings. As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features $s$ is unknown.
Score: 7.24496247221802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decision trees are widely used for their low computational cost, good predictive performance, and ability to assess the importance of features. Though often used in practice for feature selection, the theoretical guarantees of these methods are not well understood. We here obtain a tight finite sample bound for the feature selection problem in linear regression using single-depth decision trees. We examine the statistical properties of these "decision stumps" for the recovery of the $s$ active features from $p$ total features, where $s \ll p$. Our analysis provides tight sample performance guarantees on high-dimensional sparse systems which align with the finite sample bound of $O(s \log p)$ as obtained by Lasso, improving upon previous bounds for both the median and optimal splitting criteria. Our results extend to the non-linear regime as well as arbitrary sub-Gaussian distributions, demonstrating that tree based methods attain strong feature selection properties under a wide variety of settings and further shedding light on the success of these methods in practice. As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features $s$ is unknown. We further validate our theoretical results and proof methodology using computational experiments.

Related papers

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits [58.63897489864948]
Reinforcement learning with outcome-based feedback faces a fundamental challenge.<n>How do we assign credit to the right actions?<n>This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
arXiv Detail & Related papers (2025-05-26T17:44:08Z)
Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts [0.0]
In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output. This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions.
arXiv Detail & Related papers (2025-02-07T18:39:35Z)
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning [11.304227281260896]
We introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of $Q$-values. It provides robust uncertainty quantification and estimates lower confidence bounds (LCB) of $Q$-values. We also emphasize on diversity within randomized value functions and enhance efficiency by introducing a diversity regularization method, reducing the requisite number of networks.
arXiv Detail & Related papers (2024-04-09T10:15:18Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Neural Greedy Pursuit for Feature Selection [72.4121881681861]
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem. We use neural networks as predictors in the algorithm to compute the loss.
arXiv Detail & Related papers (2022-07-19T16:39:16Z)
A Huber loss-based super learner with applications to healthcare expenditures [0.0]
We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to downweight. We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings.
arXiv Detail & Related papers (2022-05-13T19:57:50Z)
Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST) Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness. The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z)
Sparsity-based Feature Selection for Anomalous Subgroup Discovery [5.960402015658508]
Anomalous pattern detection aims to identify instances where deviation from normalcy is evident, and is widely applicable across domains. There is a common lack of a principled and scalable feature selection method for efficient discovery. In this paper, we proposed a sparsity-based automated feature selection framework, which encodes systemic outcome deviations via the sparsity of feature-driven odds ratios.
arXiv Detail & Related papers (2022-01-06T10:56:43Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm. We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive Bayes [51.55826927508311]
We propose a sparse version of naive Bayes, which can be used for feature selection. We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size.
arXiv Detail & Related papers (2019-05-23T19:30:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.