A Consistent and Scalable Algorithm for Best Subset Selection in Single
Index Models
- URL: http://arxiv.org/abs/2309.06230v1
- Date: Tue, 12 Sep 2023 13:48:06 GMT
- Title: A Consistent and Scalable Algorithm for Best Subset Selection in Single
Index Models
- Authors: Borui Tang, Jin Zhu, Junxian Zhu, Xueqin Wang, Heping Zhang
- Abstract summary: Best subset selection in high-dimensional models is known to be computationally intractable.
We propose the first provably scalable algorithm for best subset selection in high-dimensional SIMs.
Our algorithm enjoys the subset selection consistency and has the oracle property with a high probability.
- Score: 1.3236116985407258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analysis of high-dimensional data has led to increased interest in both
single index models (SIMs) and best subset selection. SIMs provide an
interpretable and flexible modeling framework for high-dimensional data, while
best subset selection aims to find a sparse model from a large set of
predictors. However, best subset selection in high-dimensional models is known
to be computationally intractable. Existing methods tend to relax the
selection, but do not yield the best subset solution. In this paper, we
directly tackle the intractability by proposing the first provably scalable
algorithm for best subset selection in high-dimensional SIMs. Our algorithmic
solution enjoys the subset selection consistency and has the oracle property
with a high probability. The algorithm comprises a generalized information
criterion to determine the support size of the regression coefficients,
eliminating the model selection tuning. Moreover, our method does not assume an
error distribution or a specific link function and hence is flexible to apply.
Extensive simulation results demonstrate that our method is not only
computationally efficient but also able to exactly recover the best subset in
various settings (e.g., linear regression, Poisson regression, heteroscedastic
models).
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Best-Subset Selection in Generalized Linear Models: A Fast and
Consistent Algorithm via Splicing Technique [0.6338047104436422]
Best subset section has been widely regarded as the Holy Grail of problems of this type.
We proposed and illustrated an algorithm for best subset recovery in mild conditions.
Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
arXiv Detail & Related papers (2023-08-01T03:11:31Z) - A model-free feature selection technique of feature screening and random
forest based recursive feature elimination [0.0]
We propose a model-free feature selection method for ultra-high dimensional data with mass features.
We show that the proposed method is selection consistent and $L$ consistent under weak regularity conditions.
arXiv Detail & Related papers (2023-02-15T03:39:16Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Optimally Weighted Ensembles of Regression Models: Exact Weight
Optimization and Applications [0.0]
We show that combining different regression models can yield better results than selecting a single ('best') regression model.
We outline an efficient method that obtains optimally weighted linear combination from a heterogeneous set of regression models.
arXiv Detail & Related papers (2022-06-22T09:11:14Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Subset selection for linear mixed models [0.0]
Linear mixed models (LMMs) are instrumental for regression analysis with structured dependence.
We introduce a Bayesian decision analysis for subset selection with LMMs.
These tools are applied to simulated data and a longitudinal physical activity dataset.
arXiv Detail & Related papers (2021-07-27T15:47:44Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Adaptive Graph-based Generalized Regression Model for Unsupervised
Feature Selection [11.214334712819396]
How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection.
We present a novel generalized regression model imposed by an uncorrelated constraint and the $ell_2,1$-norm regularization.
It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood.
arXiv Detail & Related papers (2020-12-27T09:07:26Z) - Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.