Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study
- URL: http://arxiv.org/abs/2103.02687v1
- Date: Wed, 3 Mar 2021 21:10:26 GMT
- Title: Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study
- Authors: Federico Zocco, Marco Maggipinto, Gian Antonio Susto and Se\'an
McLoone
- Abstract summary: This paper focuses on unsupervised variable selection based dimensionality reduction.
We present a critical evaluation of seven unsupervised greedy variable selection algorithms.
We introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm.
- Score: 3.4888132404740797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dimensionality reduction is a important step in the development of scalable
and interpretable data-driven models, especially when there are a large number
of candidate variables. This paper focuses on unsupervised variable selection
based dimensionality reduction, and in particular on unsupervised greedy
selection methods, which have been proposed by various researchers as
computationally tractable approximations to optimal subset selection. These
methods are largely distinguished from each other by the selection criterion
adopted, which include squared correlation, variance explained, mutual
information and frame potential. Motivated by the absence in the literature of
a systematic comparison of these different methods, we present a critical
evaluation of seven unsupervised greedy variable selection algorithms
considering both simulated and real world case studies. We also review the
theoretical results that provide performance guarantees and enable efficient
implementations for certain classes of greedy selection function, related to
the concept of submodularity. Furthermore, we introduce and evaluate for the
first time, a lazy implementation of the variance explained based forward
selection component analysis (FSCA) algorithm. Our experimental results show
that: (1) variance explained and mutual information based selection methods
yield smaller approximation errors than frame potential; (2) the lazy FSCA
implementation has similar performance to FSCA, while being an order of
magnitude faster to compute, making it the algorithm of choice for unsupervised
variable selection.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm.
A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate.
Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z) - Variable Selection for Kernel Two-Sample Tests [10.768155884359777]
We propose a framework based on the kernel maximum mean discrepancy (MMD)
We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees.
Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.
arXiv Detail & Related papers (2023-02-15T00:39:56Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - The Best Path Algorithm automatic variables selection via High
Dimensional Graphical Models [0.0]
This paper proposes a new algorithm for an automatic variable selection procedure in High Dimensional Graphical Models.
The algorithm selects the relevant variables for the node of interest on the basis of mutual information.
The application of the algorithm to a wide range of real-word and publicly data-sets has highlighted its potential and greater effectiveness.
arXiv Detail & Related papers (2022-11-14T10:50:57Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Solving Large-Scale Multi-Objective Optimization via Probabilistic
Prediction Model [10.916384208006157]
An efficient LSMOP algorithm should have the ability to escape the local optimal solution from the huge search space.
Maintaining the diversity of the population is one of the effective ways to improve search efficiency.
We propose a probabilistic prediction model based on trend prediction model and generating-filtering strategy, called LT-PPM, to tackle the LSMOP.
arXiv Detail & Related papers (2021-07-16T09:43:35Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Outlier Detection Ensemble with Embedded Feature Selection [42.8338013000469]
We propose an outlier detection ensemble framework with embedded feature selection (ODEFS)
For each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation.
We adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection.
arXiv Detail & Related papers (2020-01-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.