Feature selection algorithm based on incremental mutual information and
cockroach swarm optimization
- URL: http://arxiv.org/abs/2302.10522v1
- Date: Tue, 21 Feb 2023 08:51:05 GMT
- Title: Feature selection algorithm based on incremental mutual information and
cockroach swarm optimization
- Authors: Zhao and Chen
- Abstract summary: We propose an incremental mutual information based improved swarm intelligent optimization method (IMIICSO)
This method extracts decision table reduction knowledge to guide group algorithm global search.
The accuracy of feature subsets selected by the improved cockroach swarm algorithm based on incremental mutual information is better or almost the same as that of the original swarm intelligent optimization algorithm.
- Score: 12.297966427336124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection is an effective preprocessing technique to reduce data
dimension. For feature selection, rough set theory provides many measures,
among which mutual information is one of the most important attribute measures.
However, mutual information based importance measures are computationally
expensive and inaccurate, especially in hypersample instances, and it is
undoubtedly a NP-hard problem in high-dimensional hyperhigh-dimensional data
sets. Although many representative group intelligent algorithm feature
selection strategies have been proposed so far to improve the accuracy, there
is still a bottleneck when using these feature selection algorithms to process
high-dimensional large-scale data sets, which consumes a lot of performance and
is easy to select weakly correlated and redundant features. In this study, we
propose an incremental mutual information based improved swarm intelligent
optimization method (IMIICSO), which uses rough set theory to calculate the
importance of feature selection based on mutual information. This method
extracts decision table reduction knowledge to guide group algorithm global
search. By exploring the computation of mutual information of supersamples, we
can not only discard the useless features to speed up the internal and external
computation, but also effectively reduce the cardinality of the optimal feature
subset by using IMIICSO method, so that the cardinality is minimized by
comparison. The accuracy of feature subsets selected by the improved cockroach
swarm algorithm based on incremental mutual information is better or almost the
same as that of the original swarm intelligent optimization algorithm.
Experiments using 10 datasets derived from UCI, including large scale and high
dimensional datasets, confirmed the efficiency and effectiveness of the
proposed algorithm.
Related papers
- Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach [0.27624021966289597]
Feature selection is a crucial step in machine learning, especially for high-dimensional datasets.
This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS.
The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets.
arXiv Detail & Related papers (2024-10-13T23:06:10Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Compact NSGA-II for Multi-objective Feature Selection [0.24578723416255746]
We define feature selection as a multi-objective binary optimization task with the objectives of maximizing classification accuracy and minimizing the number of selected features.
In order to select optimal features, we have proposed a binary Compact NSGA-II (CNSGA-II) algorithm.
To the best of our knowledge, this is the first compact multi-objective algorithm proposed for feature selection.
arXiv Detail & Related papers (2024-02-20T01:10:12Z) - Multi-objective Binary Coordinate Search for Feature Selection [0.24578723416255746]
We propose the binary multi-objective coordinate search (MOCS) algorithm to solve large-scale feature selection problems.
Results indicate the significant superiority of our method over NSGA-II, on five real-world large-scale datasets.
arXiv Detail & Related papers (2024-02-20T00:50:26Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Low Budget Active Learning via Wasserstein Distance: An Integer
Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.
We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool.
Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Review of Swarm Intelligence-based Feature Selection Methods [3.8848561367220276]
Data mining applications with high dimensional datasets require high speed and accuracy.
One of the dimensionality reduction approaches is feature selection that can increase the accuracy of the data mining task.
State-of-the-art swarm intelligence are studied, and the recent feature selection methods based on these algorithms are reviewed.
arXiv Detail & Related papers (2020-08-07T05:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.