Multi-view Sparse Laplacian Eigenmaps for nonlinear Spectral Feature
Selection
- URL: http://arxiv.org/abs/2307.15905v1
- Date: Sat, 29 Jul 2023 06:23:51 GMT
- Title: Multi-view Sparse Laplacian Eigenmaps for nonlinear Spectral Feature
Selection
- Authors: Gaurav Srivastava, Mahesh Jangid
- Abstract summary: The complexity of high-dimensional datasets presents significant challenges for machine learning models.
To address these challenges, it is essential to identify an informative subset of features that captures the essential structure of the data.
In this study, the authors propose Multi-view Sparse Laplacian Eigenmaps (MSLE) for feature selection.
- Score: 1.6853711292804476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The complexity of high-dimensional datasets presents significant challenges
for machine learning models, including overfitting, computational complexity,
and difficulties in interpreting results. To address these challenges, it is
essential to identify an informative subset of features that captures the
essential structure of the data. In this study, the authors propose Multi-view
Sparse Laplacian Eigenmaps (MSLE) for feature selection, which effectively
combines multiple views of the data, enforces sparsity constraints, and employs
a scalable optimization algorithm to identify a subset of features that capture
the fundamental data structure. MSLE is a graph-based approach that leverages
multiple views of the data to construct a more robust and informative
representation of high-dimensional data. The method applies sparse
eigendecomposition to reduce the dimensionality of the data, yielding a reduced
feature set. The optimization problem is solved using an iterative algorithm
alternating between updating the sparse coefficients and the Laplacian graph
matrix. The sparse coefficients are updated using a soft-thresholding operator,
while the graph Laplacian matrix is updated using the normalized graph
Laplacian. To evaluate the performance of the MSLE technique, the authors
conducted experiments on the UCI-HAR dataset, which comprises 561 features, and
reduced the feature space by 10 to 90%. Our results demonstrate that even after
reducing the feature space by 90%, the Support Vector Machine (SVM) maintains
an error rate of 2.72%. Moreover, the authors observe that the SVM exhibits an
accuracy of 96.69% with an 80% reduction in the overall feature space.
Related papers
- Nonlinear Feature Aggregation: Two Algorithms driven by Theory [45.3190496371625]
Real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues.
We propose a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function.
We also test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.
arXiv Detail & Related papers (2023-06-19T19:57:33Z) - Interpretable Linear Dimensionality Reduction based on Bias-Variance
Analysis [45.3190496371625]
We propose a principled dimensionality reduction approach that maintains the interpretability of the resulting features.
In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved.
arXiv Detail & Related papers (2023-03-26T14:30:38Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector
Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously.
We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Adaptive Graph-based Generalized Regression Model for Unsupervised
Feature Selection [11.214334712819396]
How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection.
We present a novel generalized regression model imposed by an uncorrelated constraint and the $ell_2,1$-norm regularization.
It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood.
arXiv Detail & Related papers (2020-12-27T09:07:26Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z) - Model Inversion Networks for Model-Based Optimization [110.24531801773392]
We propose model inversion networks (MINs), which learn an inverse mapping from scores to inputs.
MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems.
We evaluate MINs on tasks from the Bayesian optimization literature, high-dimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data.
arXiv Detail & Related papers (2019-12-31T18:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.