Best-Subset Selection in Generalized Linear Models: A Fast and
Consistent Algorithm via Splicing Technique
- URL: http://arxiv.org/abs/2308.00251v1
- Date: Tue, 1 Aug 2023 03:11:31 GMT
- Title: Best-Subset Selection in Generalized Linear Models: A Fast and
Consistent Algorithm via Splicing Technique
- Authors: Junxian Zhu, Jin Zhu, Borui Tang, Xuanyu Chen, Hongmei Lin, Xueqin
Wang
- Abstract summary: Best subset section has been widely regarded as the Holy Grail of problems of this type.
We proposed and illustrated an algorithm for best subset recovery in mild conditions.
Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
- Score: 0.6338047104436422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In high-dimensional generalized linear models, it is crucial to identify a
sparse model that adequately accounts for response variation. Although the best
subset section has been widely regarded as the Holy Grail of problems of this
type, achieving either computational efficiency or statistical guarantees is
challenging. In this article, we intend to surmount this obstacle by utilizing
a fast algorithm to select the best subset with high certainty. We proposed and
illustrated an algorithm for best subset recovery in regularity conditions.
Under mild conditions, the computational complexity of our algorithm scales
polynomially with sample size and dimension. In addition to demonstrating the
statistical properties of our method, extensive numerical experiments reveal
that it outperforms existing methods for variable selection and coefficient
estimation. The runtime analysis shows that our implementation achieves
approximately a fourfold speedup compared to popular variable selection
toolkits like glmnet and ncvreg.
Related papers
- Efficient Numerical Algorithm for Large-Scale Damped Natural Gradient
Descent [7.368877979221163]
We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples.
Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.
arXiv Detail & Related papers (2023-10-26T16:46:13Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - A Consistent and Scalable Algorithm for Best Subset Selection in Single
Index Models [1.3236116985407258]
Best subset selection in high-dimensional models is known to be computationally intractable.
We propose the first provably scalable algorithm for best subset selection in high-dimensional SIMs.
Our algorithm enjoys the subset selection consistency and has the oracle property with a high probability.
arXiv Detail & Related papers (2023-09-12T13:48:06Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Semi-analytic approximate stability selection for correlated data in
generalized linear models [3.42658286826597]
We propose a novel approximate inference algorithm that can conduct Stability Selection without the repeated fitting.
The algorithm is based on the replica method of statistical mechanics and vector approximate message passing of information theory.
Numerical experiments indicate that the algorithm exhibits fast convergence and high approximation accuracy for both synthetic and real-world data.
arXiv Detail & Related papers (2020-03-19T10:43:12Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.