Related papers: Subspace Learning for Feature Selection via Rank Revealing QR Factorization: Unsupervised and Hybrid Approaches with Non-negative Matrix Factorization and Evolutionary Algorithm

Subspace Learning for Feature Selection via Rank Revealing QR Factorization: Unsupervised and Hybrid Approaches with Non-negative Matrix Factorization and Evolutionary Algorithm

URL: http://arxiv.org/abs/2210.00418v1
Date: Sun, 2 Oct 2022 04:04:47 GMT
Title: Subspace Learning for Feature Selection via Rank Revealing QR Factorization: Unsupervised and Hybrid Approaches with Non-negative Matrix Factorization and Evolutionary Algorithm
Authors: Amir Moslemi, Arash Ahmadian
Abstract summary: rank revealing QR (RRQR) factorization is leveraged in obtaining the most informative features as a novel unsupervised feature selection technique. A hybrid feature selection algorithm is proposed by coupling RRQR, as a filter-based technique, and a Genetic algorithm as a wrapper-based technique. The proposed algorithm shows to be dependable and robust when compared against state-of-the-art feature selection algorithms in supervised, unsupervised, and semi-supervised settings.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The selection of most informative and discriminative features from high-dimensional data has been noticed as an important topic in machine learning and data engineering. Using matrix factorization-based techniques such as nonnegative matrix factorization for feature selection has emerged as a hot topic in feature selection. The main goal of feature selection using matrix factorization is to extract a subspace which approximates the original space but in a lower dimension. In this study, rank revealing QR (RRQR) factorization, which is computationally cheaper than singular value decomposition (SVD), is leveraged in obtaining the most informative features as a novel unsupervised feature selection technique. This technique uses the permutation matrix of QR for feature selection which is a unique property to this factorization method. Moreover, QR factorization is embedded into non-negative matrix factorization (NMF) objective function as a new unsupervised feature selection method. Lastly, a hybrid feature selection algorithm is proposed by coupling RRQR, as a filter-based technique, and a Genetic algorithm as a wrapper-based technique. In this method, redundant features are removed using RRQR factorization and the most discriminative subset of features are selected using the Genetic algorithm. The proposed algorithm shows to be dependable and robust when compared against state-of-the-art feature selection algorithms in supervised, unsupervised, and semi-supervised settings. All methods are tested on seven available microarray datasets using KNN, SVM and C4.5 classifiers. In terms of evaluation metrics, the experimental results shows that the proposed method is comparable with the state-of-the-art feature selection.

Related papers

Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets. This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS. The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z)
Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses. Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z)
Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection [7.373617024876725]
We propose a Gram-Schmidt process over function spaces to detect and map out nonlinear dependencies. We provide experimental results for synthetic and real-world benchmark datasets. Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods.
arXiv Detail & Related papers (2023-11-15T21:29:57Z)
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z)
A concise method for feature selection via normalized frequencies [0.0]
In this paper, a concise method is proposed for universal feature selection. The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them. The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
arXiv Detail & Related papers (2021-06-10T15:29:54Z)
Feature Weighted Non-negative Matrix Factorization [92.45013716097753]
We propose the Feature weighted Non-negative Matrix Factorization (FNMF) in this paper. FNMF learns the weights of features adaptively according to their importances. It can be solved efficiently with the suggested optimization algorithm.
arXiv Detail & Related papers (2021-03-24T21:17:17Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Adaptive Graph-based Generalized Regression Model for Unsupervised Feature Selection [11.214334712819396]
How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection. We present a novel generalized regression model imposed by an uncorrelated constraint and the $ell_2,1$-norm regularization. It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood.
arXiv Detail & Related papers (2020-12-27T09:07:26Z)
Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization Minimization [6.41804410246642]
Feature selection is an important computational-processing in data mining and machine learning. In this paper, a novel method based on homoy hard threshold (HIHT) is proposed to solve the least square problem for multi-class feature selection.
arXiv Detail & Related papers (2020-10-08T02:06:06Z)
Augmentation of the Reconstruction Performance of Fuzzy C-Means with an Optimized Fuzzification Factor Vector [99.19847674810079]
Fuzzy C-Means (FCM) is one of the most frequently used methods to construct information granules. In this paper, we augment the FCM-based degranulation mechanism by introducing a vector of fuzzification factors. Experiments completed for both synthetic and publicly available datasets show that the proposed approach outperforms the generic data reconstruction approach.
arXiv Detail & Related papers (2020-04-13T04:17:30Z)
Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.