Fibonacci and k-Subsecting Recursive Feature Elimination
- URL: http://arxiv.org/abs/2007.14920v1
- Date: Wed, 29 Jul 2020 15:53:04 GMT
- Title: Fibonacci and k-Subsecting Recursive Feature Elimination
- Authors: Dariusz Brzezinski
- Abstract summary: Feature selection is a data mining task with the potential of speeding up classification algorithms.
We propose two novel algorithms called Fibonacci- and k-Subsecting Recursive Feature Elimination.
Results show that Fibonacci and k-Subsecting Recursive Feature Elimination are capable of selecting a smaller subset of features much faster than standard RFE.
- Score: 2.741266294612776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature selection is a data mining task with the potential of speeding up
classification algorithms, enhancing model comprehensibility, and improving
learning accuracy. However, finding a subset of features that is optimal in
terms of predictive accuracy is usually computationally intractable. Out of
several heuristic approaches to dealing with this problem, the Recursive
Feature Elimination (RFE) algorithm has received considerable interest from
data mining practitioners. In this paper, we propose two novel algorithms
inspired by RFE, called Fibonacci- and k-Subsecting Recursive Feature
Elimination, which remove features in logarithmic steps, probing the wrapped
classifier more densely for the more promising feature subsets. The proposed
algorithms are experimentally compared against RFE on 28 highly
multidimensional datasets and evaluated in a practical case study involving 3D
electron density maps from the Protein Data Bank. The results show that
Fibonacci and k-Subsecting Recursive Feature Elimination are capable of
selecting a smaller subset of features much faster than standard RFE, while
achieving comparable predictive performance.
Related papers
- Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets.
This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS.
The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Nonlinear Feature Aggregation: Two Algorithms driven by Theory [45.3190496371625]
Real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues.
We propose a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function.
We also test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.
arXiv Detail & Related papers (2023-06-19T19:57:33Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - IVFS: Simple and Efficient Feature Selection for High Dimensional
Topology Preservation [33.424663018395684]
We propose a simple and effective feature selection algorithm to enhance sample similarity preservation.
The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data.
arXiv Detail & Related papers (2020-04-02T23:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.