A concise method for feature selection via normalized frequencies
- URL: http://arxiv.org/abs/2106.05814v1
- Date: Thu, 10 Jun 2021 15:29:54 GMT
- Title: A concise method for feature selection via normalized frequencies
- Authors: Song Tan, Xia He
- Abstract summary: In this paper, a concise method is proposed for universal feature selection.
The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them.
The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature selection is an important part of building a machine learning model.
By eliminating redundant or misleading features from data, the machine learning
model can achieve better performance while reducing the demand on com-puting
resources. Metaheuristic algorithms are mostly used to implement feature
selection such as swarm intelligence algorithms and evolutionary algorithms.
However, they suffer from the disadvantage of relative complexity and slowness.
In this paper, a concise method is proposed for universal feature selection.
The proposed method uses a fusion of the filter method and the wrapper method,
rather than a combination of them. In the method, one-hoting encoding is used
to preprocess the dataset, and random forest is utilized as the classifier. The
proposed method uses normalized frequencies to assign a value to each feature,
which will be used to find the optimal feature subset. Furthermore, we propose
a novel approach to exploit the outputs of mutual information, which allows for
a better starting point for the experiments. Two real-world dataset in the
field of intrusion detection were used to evaluate the proposed method. The
evaluation results show that the proposed method outperformed several
state-of-the-art related works in terms of accuracy, precision, recall, F-score
and AUC.
Related papers
- A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Low-rank Dictionary Learning for Unsupervised Feature Selection [11.634317251468968]
We introduce a novel unsupervised feature selection approach by applying dictionary learning ideas in a low-rank representation.
A unified objective function for unsupervised feature selection is proposed in a sparse way by an $ell_2,1$-norm regularization.
Our experimental findings reveal that the proposed method outperforms the state-of-the-art algorithm.
arXiv Detail & Related papers (2021-06-21T13:39:10Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Quick and Robust Feature Selection: the Strength of Energy-efficient
Sparse Training for Autoencoders [4.561081324313315]
Feature selection, which identifies the most relevant and informative attributes of a dataset, has been introduced as a solution to this problem.
Most of the existing feature selection methods are computationally inefficient.
In this paper, a novel and flexible method for unsupervised feature selection is proposed.
arXiv Detail & Related papers (2020-12-01T15:05:15Z) - Optimizing Speech Emotion Recognition using Manta-Ray Based Feature
Selection [1.4502611532302039]
We show that concatenation of features, extracted by using different existing feature extraction methods can boost the classification accuracy.
We also perform a novel application of Manta Ray optimization in speech emotion recognition tasks that resulted in a state-of-the-art result.
arXiv Detail & Related papers (2020-09-18T16:09:34Z) - IVFS: Simple and Efficient Feature Selection for High Dimensional
Topology Preservation [33.424663018395684]
We propose a simple and effective feature selection algorithm to enhance sample similarity preservation.
The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data.
arXiv Detail & Related papers (2020-04-02T23:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.