Related papers: Greedy Search Algorithms for Unsupervised Variable Selection: A Comparative Study

Greedy Search Algorithms for Unsupervised Variable Selection: A Comparative Study

URL: http://arxiv.org/abs/2103.02687v1
Date: Wed, 3 Mar 2021 21:10:26 GMT
Title: Greedy Search Algorithms for Unsupervised Variable Selection: A Comparative Study
Authors: Federico Zocco, Marco Maggipinto, Gian Antonio Susto and Se\'an McLoone
Abstract summary: This paper focuses on unsupervised variable selection based dimensionality reduction. We present a critical evaluation of seven unsupervised greedy variable selection algorithms. We introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm.
Score: 3.4888132404740797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dimensionality reduction is a important step in the development of scalable and interpretable data-driven models, especially when there are a large number of candidate variables. This paper focuses on unsupervised variable selection based dimensionality reduction, and in particular on unsupervised greedy selection methods, which have been proposed by various researchers as computationally tractable approximations to optimal subset selection. These methods are largely distinguished from each other by the selection criterion adopted, which include squared correlation, variance explained, mutual information and frame potential. Motivated by the absence in the literature of a systematic comparison of these different methods, we present a critical evaluation of seven unsupervised greedy variable selection algorithms considering both simulated and real world case studies. We also review the theoretical results that provide performance guarantees and enable efficient implementations for certain classes of greedy selection function, related to the concept of submodularity. Furthermore, we introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm. Our experimental results show that: (1) variance explained and mutual information based selection methods yield smaller approximation errors than frame potential; (2) the lazy FSCA implementation has similar performance to FSCA, while being an order of magnitude faster to compute, making it the algorithm of choice for unsupervised variable selection.

Related papers

Disentangling the Roles of Representation and Selection in Data Pruning [6.141776277655227]
We decompose data pruning into two key components: the data representation and the selection algorithm.<n>Our theoretical and empirical results highlight the crucial role of representations.<n>Different selection algorithms excel in different settings, and none consistently outperforms the others.
arXiv Detail & Related papers (2025-07-04T15:25:04Z)
A High-Dimensional Feature Selection Algorithm Based on Multiobjective Differential Evolution [6.912442653561439]
Multiobjective feature selection seeks to determine the most discriminative feature subset.<n>The proposed method significantly outperforms several state-of-the-art multiobjective feature selection approaches.
arXiv Detail & Related papers (2025-05-09T02:02:49Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm. A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate. Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z)
Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique [0.6338047104436422]
Best subset section has been widely regarded as the Holy Grail of problems of this type. We proposed and illustrated an algorithm for best subset recovery in mild conditions. Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
arXiv Detail & Related papers (2023-08-01T03:11:31Z)
Variable Selection for Kernel Two-Sample Tests [10.768155884359777]
We propose a framework based on the kernel maximum mean discrepancy (MMD) We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees. Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.
arXiv Detail & Related papers (2023-02-15T00:39:56Z)
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs) We propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm. In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z)
The Best Path Algorithm automatic variables selection via High Dimensional Graphical Models [0.0]
This paper proposes a new algorithm for an automatic variable selection procedure in High Dimensional Graphical Models. The algorithm selects the relevant variables for the node of interest on the basis of mutual information. The application of the algorithm to a wide range of real-word and publicly data-sets has highlighted its potential and greater effectiveness.
arXiv Detail & Related papers (2022-11-14T10:50:57Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Solving Large-Scale Multi-Objective Optimization via Probabilistic Prediction Model [10.916384208006157]
An efficient LSMOP algorithm should have the ability to escape the local optimal solution from the huge search space. Maintaining the diversity of the population is one of the effective ways to improve search efficiency. We propose a probabilistic prediction model based on trend prediction model and generating-filtering strategy, called LT-PPM, to tackle the LSMOP.
arXiv Detail & Related papers (2021-07-16T09:43:35Z)
Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method. A subset of optimal features will be selected in group, and the number of selected features will be determined automatically. Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Outlier Detection Ensemble with Embedded Feature Selection [42.8338013000469]
We propose an outlier detection ensemble framework with embedded feature selection (ODEFS) For each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation. We adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection.
arXiv Detail & Related papers (2020-01-15T13:14:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.