A Two-Stage Variable Selection Approach for Correlated High Dimensional
Predictors
- URL: http://arxiv.org/abs/2103.13357v1
- Date: Wed, 24 Mar 2021 17:28:34 GMT
- Title: A Two-Stage Variable Selection Approach for Correlated High Dimensional
Predictors
- Authors: Zhiyuan Li
- Abstract summary: We propose a two-stage approach that combines a variable clustering stage and a group variable stage for the group variable selection problem.
The variable clustering stage uses information from the data to find a group structure, which improves the performance of the existing group variable selection methods.
The two-stage method shows a better performance, in terms of the prediction accuracy, as well as in the accuracy to select active predictors.
- Score: 4.8128078741263725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When fitting statistical models, some predictors are often found to be
correlated with each other, and functioning together. Many group variable
selection methods are developed to select the groups of predictors that are
closely related to the continuous or categorical response. These existing
methods usually assume the group structures are well known. For example,
variables with similar practical meaning, or dummy variables created by
categorical data. However, in practice, it is impractical to know the exact
group structure, especially when the variable dimensional is large. As a
result, the group variable selection results may be selected. To solve the
challenge, we propose a two-stage approach that combines a variable clustering
stage and a group variable stage for the group variable selection problem. The
variable clustering stage uses information from the data to find a group
structure, which improves the performance of the existing group variable
selection methods. For ultrahigh dimensional data, where the predictors are
much larger than observations, we incorporated a variable screening method in
the first stage and shows the advantages of such an approach. In this article,
we compared and discussed the performance of four existing group variable
selection methods under different simulation models, with and without the
variable clustering stage. The two-stage method shows a better performance, in
terms of the prediction accuracy, as well as in the accuracy to select active
predictors. An athlete's data is also used to show the advantages of the
proposed method.
Related papers
- Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Scalable variable selection for two-view learning tasks with projection
operators [0.0]
We propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems.
Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions.
arXiv Detail & Related papers (2023-07-04T08:22:05Z) - Selective inference using randomized group lasso estimators for general models [3.4034453928075865]
The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data.
A randomized group-regularized optimization problem is studied.
Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume.
arXiv Detail & Related papers (2023-06-24T01:14:26Z) - HiPerformer: Hierarchically Permutation-Equivariant Transformer for Time
Series Forecasting [56.95572957863576]
We propose a hierarchically permutation-equivariant model that considers both the relationship among components in the same group and the relationship among groups.
The experiments conducted on real-world data demonstrate that the proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2023-05-14T05:11:52Z) - DiscoVars: A New Data Analysis Perspective -- Application in Variable
Selection for Clustering [0.0]
We present a new data analysis perspective to determine variable importance regardless of the underlying learning task.
We propose a new methodology to select important variables from the data by first creating dependency networks among all variables.
We present our tool as a Shiny app which is a user-friendly interface development environment.
arXiv Detail & Related papers (2023-04-08T10:57:19Z) - Composite Feature Selection using Deep Ensembles [130.72015919510605]
We investigate the problem of discovering groups of predictive features without predefined grouping.
We introduce a novel deep learning architecture that uses an ensemble of feature selection models to find predictive groups.
We propose a new metric to measure similarity between discovered groups and the ground truth.
arXiv Detail & Related papers (2022-11-01T17:49:40Z) - Improving Group Lasso for high-dimensional categorical data [0.90238471756546]
Group Lasso is a well known efficient algorithm for selection continuous or categorical variables.
We propose a two-step procedure to obtain a sparse solution of the Group Lasso.
We show that our method performs better than the state of the art algorithms with respect to the prediction accuracy or model dimension.
arXiv Detail & Related papers (2022-10-25T13:43:57Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - VC-PCR: A Prediction Method based on Supervised Variable Selection and
Clustering [1.1470070927586016]
This paper presents VC-PCR, a prediction method that supervises variable selection and variable clustering.
Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.
arXiv Detail & Related papers (2022-02-02T11:41:39Z) - Group Heterogeneity Assessment for Multilevel Models [68.95633278540274]
Many data sets contain an inherent multilevel structure.
Taking this structure into account is critical for the accuracy and calibration of any statistical analysis performed on such data.
We propose a flexible framework for efficiently assessing differences between the levels of given grouping variables in the data.
arXiv Detail & Related papers (2020-05-06T12:42:04Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.