Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection
- URL: http://arxiv.org/abs/2012.14595v1
- Date: Tue, 29 Dec 2020 04:08:38 GMT
- Title: Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection
- Authors: Zhengxin Li, Feiping Nie, Jintang Bian, Xuelong Li
- Abstract summary: We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
- Score: 138.97647716793333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of data mining, how to deal with high-dimensional data is an
inevitable problem. Unsupervised feature selection has attracted more and more
attention because it does not rely on labels. The performance of spectral-based
unsupervised methods depends on the quality of constructed similarity matrix,
which is used to depict the intrinsic structure of data. However, real-world
data contain a large number of noise samples and features, making the
similarity matrix constructed by original data cannot be completely reliable.
Worse still, the size of similarity matrix expands rapidly as the number of
samples increases, making the computational cost increase significantly.
Inspired by principal component analysis, we propose a simple and efficient
unsupervised feature selection method, by combining reconstruction error with
$l_{2,p}$-norm regularization. The projection matrix, which is used for feature
selection, is learned by minimizing the reconstruction error under the sparse
constraint. Then, we present an efficient optimization algorithm to solve the
proposed unsupervised model, and analyse the convergence and computational
complexity of the algorithm theoretically. Finally, extensive experiments on
real-world data sets demonstrate the effectiveness of our proposed method.
Related papers
- Unsupervised Feature Selection Algorithm Based on Graph Filtering and Self-representation [5.840228332438659]
We proposed an unsupervised feature selection algorithm based on graph filtering and self-representation.
An iterative algorithm was applied to effectively solve the proposed objective function.
arXiv Detail & Related papers (2024-11-01T00:00:08Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Multi-view Sparse Laplacian Eigenmaps for nonlinear Spectral Feature
Selection [1.6853711292804476]
The complexity of high-dimensional datasets presents significant challenges for machine learning models.
To address these challenges, it is essential to identify an informative subset of features that captures the essential structure of the data.
In this study, the authors propose Multi-view Sparse Laplacian Eigenmaps (MSLE) for feature selection.
arXiv Detail & Related papers (2023-07-29T06:23:51Z) - Linearly-scalable learning of smooth low-dimensional patterns with
permutation-aided entropic dimension reduction [0.0]
In many data science applications, the objective is to extract appropriately-ordered smooth low-dimensional data patterns from high-dimensional data sets.
We show that when selecting the Euclidean smoothness as a pattern quality criterium, both of these problems can be efficiently solved numerically.
arXiv Detail & Related papers (2023-06-17T08:03:24Z) - Recovering Simultaneously Structured Data via Non-Convex Iteratively
Reweighted Least Squares [0.8702432681310401]
We propose a new algorithm for recovering data that adheres to multiple, heterogeneous low-dimensional structures from linear observations.
We show that the IRLS method favorable in identifying low/comckuele state measurements.
arXiv Detail & Related papers (2023-06-08T06:35:47Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Adaptive Graph-based Generalized Regression Model for Unsupervised
Feature Selection [11.214334712819396]
How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection.
We present a novel generalized regression model imposed by an uncorrelated constraint and the $ell_2,1$-norm regularization.
It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood.
arXiv Detail & Related papers (2020-12-27T09:07:26Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.