VC-PCR: A Prediction Method based on Supervised Variable Selection and
Clustering
- URL: http://arxiv.org/abs/2202.00975v1
- Date: Wed, 2 Feb 2022 11:41:39 GMT
- Title: VC-PCR: A Prediction Method based on Supervised Variable Selection and
Clustering
- Authors: Rebecca Marion, Johannes Lederer, Bernadette Govaerts, Rainer von
Sachs
- Abstract summary: This paper presents VC-PCR, a prediction method that supervises variable selection and variable clustering.
Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.
- Score: 1.1470070927586016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse linear prediction methods suffer from decreased prediction accuracy
when the predictor variables have cluster structure (e.g. there are highly
correlated groups of variables). To improve prediction accuracy, various
methods have been proposed to identify variable clusters from the data and
integrate cluster information into a sparse modeling process. But none of these
methods achieve satisfactory performance for prediction, variable selection and
variable clustering simultaneously. This paper presents Variable Cluster
Principal Component Regression (VC-PCR), a prediction method that supervises
variable selection and variable clustering in order to solve this problem.
Experiments with real and simulated data demonstrate that, compared to
competitor methods, VC-PCR achieves better prediction, variable selection and
clustering performance when cluster structure is present.
Related papers
- Time series clustering based on prediction accuracy of global
forecasting models [0.0]
A novel method to perform model-based clustering of time series is proposed in this paper.
Unlike most techniques proposed in the literature, the method considers the predictive accuracy as the main element for constructing the clustering partition.
An extensive simulation study shows that our method outperforms several alternative techniques concerning both clustering effectiveness and predictive accuracy.
arXiv Detail & Related papers (2023-04-30T13:12:19Z) - Variable Clustering via Distributionally Robust Nodewise Regression [7.289979396903827]
We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression.
We derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM algorithm for implementation.
arXiv Detail & Related papers (2022-12-15T16:23:25Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - Multi-objective Semi-supervised Clustering for Finding Predictive
Clusters [0.5371337604556311]
This study focuses on clustering problems and aims to find compact clusters that are informative regarding the outcome variable.
The main goal is partitioning data points so that observations in each cluster are similar and the outcome variable can be predicated using these clusters simultaneously.
arXiv Detail & Related papers (2022-01-26T06:24:38Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - A Two-Stage Variable Selection Approach for Correlated High Dimensional
Predictors [4.8128078741263725]
We propose a two-stage approach that combines a variable clustering stage and a group variable stage for the group variable selection problem.
The variable clustering stage uses information from the data to find a group structure, which improves the performance of the existing group variable selection methods.
The two-stage method shows a better performance, in terms of the prediction accuracy, as well as in the accuracy to select active predictors.
arXiv Detail & Related papers (2021-03-24T17:28:34Z) - Cluster-Specific Predictions with Multi-Task Gaussian Processes [4.368185344922342]
A model involving Gaussian processes (GPs) is introduced to handle multi-task learning, clustering, and prediction.
The model is instantiated as a mixture of multi-task GPs with common mean processes.
The overall algorithm, called MagmaClust, is publicly available as an R package.
arXiv Detail & Related papers (2020-11-16T11:08:59Z) - Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information.
We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation.
Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z) - Decorrelated Clustering with Data Selection Bias [55.91842043124102]
We propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias.
Our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias.
arXiv Detail & Related papers (2020-06-29T08:55:50Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.