Review of Swarm Intelligence-based Feature Selection Methods
- URL: http://arxiv.org/abs/2008.04103v1
- Date: Fri, 7 Aug 2020 05:18:58 GMT
- Title: Review of Swarm Intelligence-based Feature Selection Methods
- Authors: Mehrdad Rostami, Kamal Berahmand, Saman Forouzandeh
- Abstract summary: Data mining applications with high dimensional datasets require high speed and accuracy.
One of the dimensionality reduction approaches is feature selection that can increase the accuracy of the data mining task.
State-of-the-art swarm intelligence are studied, and the recent feature selection methods based on these algorithms are reviewed.
- Score: 3.8848561367220276
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In the past decades, the rapid growth of computer and database technologies
has led to the rapid growth of large-scale datasets. On the other hand, data
mining applications with high dimensional datasets that require high speed and
accuracy are rapidly increasing. An important issue with these applications is
the curse of dimensionality, where the number of features is much higher than
the number of patterns. One of the dimensionality reduction approaches is
feature selection that can increase the accuracy of the data mining task and
reduce its computational complexity. The feature selection method aims at
selecting a subset of features with the lowest inner similarity and highest
relevancy to the target class. It reduces the dimensionality of the data by
eliminating irrelevant, redundant, or noisy data. In this paper, a comparative
analysis of different feature selection methods is presented, and a general
categorization of these methods is performed. Moreover, in this paper,
state-of-the-art swarm intelligence are studied, and the recent feature
selection methods based on these algorithms are reviewed. Furthermore, the
strengths and weaknesses of the different studied swarm intelligence-based
feature selection methods are evaluated.
Related papers
- Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets.
This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS.
The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Feature selection algorithm based on incremental mutual information and
cockroach swarm optimization [12.297966427336124]
We propose an incremental mutual information based improved swarm intelligent optimization method (IMIICSO)
This method extracts decision table reduction knowledge to guide group algorithm global search.
The accuracy of feature subsets selected by the improved cockroach swarm algorithm based on incremental mutual information is better or almost the same as that of the original swarm intelligent optimization algorithm.
arXiv Detail & Related papers (2023-02-21T08:51:05Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Quick and Robust Feature Selection: the Strength of Energy-efficient
Sparse Training for Autoencoders [4.561081324313315]
Feature selection, which identifies the most relevant and informative attributes of a dataset, has been introduced as a solution to this problem.
Most of the existing feature selection methods are computationally inefficient.
In this paper, a novel and flexible method for unsupervised feature selection is proposed.
arXiv Detail & Related papers (2020-12-01T15:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.