Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study
- URL: http://arxiv.org/abs/2103.02687v1
- Date: Wed, 3 Mar 2021 21:10:26 GMT
- Title: Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study
- Authors: Federico Zocco, Marco Maggipinto, Gian Antonio Susto and Se\'an
McLoone
- Abstract summary: This paper focuses on unsupervised variable selection based dimensionality reduction.
We present a critical evaluation of seven unsupervised greedy variable selection algorithms.
We introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm.
- Score: 3.4888132404740797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dimensionality reduction is a important step in the development of scalable
and interpretable data-driven models, especially when there are a large number
of candidate variables. This paper focuses on unsupervised variable selection
based dimensionality reduction, and in particular on unsupervised greedy
selection methods, which have been proposed by various researchers as
computationally tractable approximations to optimal subset selection. These
methods are largely distinguished from each other by the selection criterion
adopted, which include squared correlation, variance explained, mutual
information and frame potential. Motivated by the absence in the literature of
a systematic comparison of these different methods, we present a critical
evaluation of seven unsupervised greedy variable selection algorithms
considering both simulated and real world case studies. We also review the
theoretical results that provide performance guarantees and enable efficient
implementations for certain classes of greedy selection function, related to
the concept of submodularity. Furthermore, we introduce and evaluate for the
first time, a lazy implementation of the variance explained based forward
selection component analysis (FSCA) algorithm. Our experimental results show
that: (1) variance explained and mutual information based selection methods
yield smaller approximation errors than frame potential; (2) the lazy FSCA
implementation has similar performance to FSCA, while being an order of
magnitude faster to compute, making it the algorithm of choice for unsupervised
variable selection.
Related papers
- Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Variable Selection for Kernel Two-Sample Tests [10.768155884359777]
We propose a framework based on the kernel maximum mean discrepancy (MMD)
We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees.
Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.
arXiv Detail & Related papers (2023-02-15T00:39:56Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - The Best Path Algorithm automatic variables selection via High
Dimensional Graphical Models [0.0]
This paper proposes a new algorithm for an automatic variable selection procedure in High Dimensional Graphical Models.
The algorithm selects the relevant variables for the node of interest on the basis of mutual information.
The application of the algorithm to a wide range of real-word and publicly data-sets has highlighted its potential and greater effectiveness.
arXiv Detail & Related papers (2022-11-14T10:50:57Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Solving Large-Scale Multi-Objective Optimization via Probabilistic
Prediction Model [10.916384208006157]
An efficient LSMOP algorithm should have the ability to escape the local optimal solution from the huge search space.
Maintaining the diversity of the population is one of the effective ways to improve search efficiency.
We propose a probabilistic prediction model based on trend prediction model and generating-filtering strategy, called LT-PPM, to tackle the LSMOP.
arXiv Detail & Related papers (2021-07-16T09:43:35Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z) - Outlier Detection Ensemble with Embedded Feature Selection [42.8338013000469]
We propose an outlier detection ensemble framework with embedded feature selection (ODEFS)
For each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation.
We adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection.
arXiv Detail & Related papers (2020-01-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.