Feature Grouping and Sparse Principal Component Analysis
- URL: http://arxiv.org/abs/2106.13685v1
- Date: Fri, 25 Jun 2021 15:08:39 GMT
- Title: Feature Grouping and Sparse Principal Component Analysis
- Authors: Haiyan Jiang, Shanshan Qin, Dejing Dou
- Abstract summary: Grouping and Sparse Principal Analysis (SPCA) is widely used in data processing dimension reduction.
FGSPCA allows loadings to belong to disjoint homogeneous groups, with sparsity as a special case.
- Score: 23.657672812296518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse Principal Component Analysis (SPCA) is widely used in data processing
and dimension reduction; it uses the lasso to produce modified principal
components with sparse loadings for better interpretability. However, sparse
PCA never considers an additional grouping structure where the loadings share
similar coefficients (i.e., feature grouping), besides a special group with all
coefficients being zero (i.e., feature selection). In this paper, we propose a
novel method called Feature Grouping and Sparse Principal Component Analysis
(FGSPCA) which allows the loadings to belong to disjoint homogeneous groups,
with sparsity as a special case. The proposed FGSPCA is a subspace learning
method designed to simultaneously perform grouping pursuit and feature
selection, by imposing a non-convex regularization with naturally adjustable
sparsity and grouping effect. To solve the resulting non-convex optimization
problem, we propose an alternating algorithm that incorporates the
difference-of-convex programming, augmented Lagrange and coordinate descent
methods. Additionally, the experimental results on real data sets show that the
proposed FGSPCA benefits from the grouping effect compared with methods without
grouping effect.
Related papers
- A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Achieving Sample and Computational Efficient Reinforcement Learning by
Action Space Reduction via Grouping [7.691755449724638]
Reinforcement learning often needs to deal with the exponential growth of states and actions in high-dimensional spaces.
We learn the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity.
arXiv Detail & Related papers (2023-06-22T15:40:10Z) - Sparse-group boosting -- Unbiased group and variable selection [0.0]
We show that within-group and between-group sparsity can be controlled by a mixing parameter.
With simulations, gene data as well as agricultural data we show the effectiveness and predictive competitiveness of this estimator.
arXiv Detail & Related papers (2022-06-13T17:44:16Z) - Exclusive Group Lasso for Structured Variable Selection [10.86544864007391]
A structured variable selection problem is considered.
A composite norm can be properly designed to promote such exclusive group sparsity patterns.
An active set algorithm is proposed that builds the solution by including structure atoms into the estimated support.
arXiv Detail & Related papers (2021-08-23T16:55:13Z) - Robust Matrix Factorization with Grouping Effect [28.35582493230616]
We propose a novel method called Matrix Factorization with Grouping effect (GRMF)
The proposed GRMF can learn grouping structure and sparsity in MF without prior knowledge.
Experiments have been conducted using real-world data sets with outliers and contaminated noise.
arXiv Detail & Related papers (2021-06-25T15:03:52Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Robust Recursive Partitioning for Heterogeneous Treatment Effects with
Uncertainty Quantification [84.53697297858146]
Subgroup analysis of treatment effects plays an important role in applications from medicine to public policy to recommender systems.
Most of the current methods of subgroup analysis begin with a particular algorithm for estimating individualized treatment effects (ITE)
This paper develops a new method for subgroup analysis, R2P, that addresses all these weaknesses.
arXiv Detail & Related papers (2020-06-14T14:50:02Z) - Robust Grouped Variable Selection Using Distributionally Robust
Optimization [11.383869751239166]
We propose a Distributionally Robust Optimization (DRO) formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations.
We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator.
We show that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level.
arXiv Detail & Related papers (2020-06-10T22:32:52Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z) - Invariant Feature Coding using Tensor Product Representation [75.62232699377877]
We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier.
A novel feature model that explicitly consider group action is proposed for principal component analysis and k-means clustering.
arXiv Detail & Related papers (2019-06-05T07:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.