Algorithmic Stability and Generalization of an Unsupervised Feature
Selection Algorithm
- URL: http://arxiv.org/abs/2010.09416v2
- Date: Wed, 5 Jan 2022 14:10:54 GMT
- Title: Algorithmic Stability and Generalization of an Unsupervised Feature
Selection Algorithm
- Authors: Xinxing Wu and Qiang Cheng
- Abstract summary: Algorithmic stability is a key characteristic of an algorithm regarding its sensitivity to perturbations of input samples.
In this paper, we propose an innovative unsupervised feature selection algorithm attaining this stability with provable guarantees.
- Score: 20.564573628659918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection, as a vital dimension reduction technique, reduces data
dimension by identifying an essential subset of input features, which can
facilitate interpretable insights into learning and inference processes.
Algorithmic stability is a key characteristic of an algorithm regarding its
sensitivity to perturbations of input samples. In this paper, we propose an
innovative unsupervised feature selection algorithm attaining this stability
with provable guarantees. The architecture of our algorithm consists of a
feature scorer and a feature selector. The scorer trains a neural network (NN)
to globally score all the features, and the selector adopts a dependent sub-NN
to locally evaluate the representation abilities for selecting features.
Further, we present algorithmic stability analysis and show that our algorithm
has a performance guarantee via a generalization error bound. Extensive
experimental results on real-world datasets demonstrate superior generalization
performance of our proposed algorithm to strong baseline methods. Also, the
properties revealed by our theoretical analysis and the stability of our
algorithm-selected features are empirically confirmed.
Related papers
- Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection [25.29451529910051]
We propose the first provable guarantee for algorithm selection based on algorithm features.
We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors.
arXiv Detail & Related papers (2024-05-18T17:38:25Z) - Deep Unrolling for Nonconvex Robust Principal Component Analysis [75.32013242448151]
We design algorithms for Robust Component Analysis (A)
It consists in decomposing a matrix into the sum of a low Principaled matrix and a sparse Principaled matrix.
arXiv Detail & Related papers (2023-07-12T03:48:26Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Sequential Attention for Feature Selection [12.89764845700709]
We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks.
We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm.
arXiv Detail & Related papers (2022-09-29T15:49:06Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Canonical-Correlation-Based Fast Feature Selection for Structural Health Monitoring [4.533223834527272]
This paper proposes a fast feature selection algorithm by efficiently computing the sum of squared canonical correlation coefficients between monitored features and target variables of interest in greedy search.
The proposed algorithm is applied to both synthetic and real datasets to illustrate its advantages in terms of computational speed, general classification and regression tasks, as well as damage-sensitive feature selection tasks.
arXiv Detail & Related papers (2021-06-15T15:55:17Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Online Deterministic Annealing for Classification and Clustering [0.0]
We introduce an online prototype-based learning algorithm for clustering and classification.
We show that the proposed algorithm constitutes a competitive-learning neural network, the learning rule of which is formulated as an online approximation algorithm.
arXiv Detail & Related papers (2021-02-11T04:04:21Z) - Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially.
Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.