Migrating Birds Optimization-Based Feature Selection for Text
Classification
- URL: http://arxiv.org/abs/2401.10270v1
- Date: Thu, 4 Jan 2024 08:11:03 GMT
- Title: Migrating Birds Optimization-Based Feature Selection for Text
Classification
- Authors: Cem Kaya, Zeynep Hilal Kilimci, Mitat Uysal, Murat Kaya
- Abstract summary: MBO-NB is a novel approach to address feature selection challenges in text classification having large number of features.
Our experiments demonstrate MBO-NB's superior effectiveness in feature reduction compared to other existing techniques.
- Score: 0.4915744683251149
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This research introduces a novel approach, MBO-NB, that leverages Migrating
Birds Optimization (MBO) coupled with Naive Bayes as an internal classifier to
address feature selection challenges in text classification having large number
of features. Focusing on computational efficiency, we preprocess raw data using
the Information Gain algorithm, strategically reducing the feature count from
an average of 62221 to 2089. Our experiments demonstrate MBO-NB's superior
effectiveness in feature reduction compared to other existing techniques,
emphasizing an increased classification accuracy. The successful integration of
Naive Bayes within MBO presents a well-rounded solution. In individual
comparisons with Particle Swarm Optimization (PSO), MBO-NB consistently
outperforms by an average of 6.9% across four setups. This research offers
valuable insights into enhancing feature selection methods, providing a
scalable and effective solution for text classification
Related papers
- Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection [65.96556073745197]
DiverSified File selection algorithm (DiSF) is proposed to select the most decorrelated text files in the feature space.
DiSF saves 98.5% of 590M training files in SlimPajama, outperforming the full-data pre-training within a 50B training budget.
arXiv Detail & Related papers (2025-04-29T11:13:18Z) - Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection [52.716143424856185]
We propose LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection.
LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors.
Our method also outperforms the greedy search in attribution efficiency, being 1.6 times faster.
arXiv Detail & Related papers (2025-04-01T06:58:15Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives.
We derive a bound on the maximum information gain of these invariant kernels.
We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z) - An Effective Networks Intrusion Detection Approach Based on Hybrid
Harris Hawks and Multi-Layer Perceptron [47.81867479735455]
This paper proposes an Intrusion Detection System (IDS) employing the Harris Hawks Optimization (HHO) to optimize Multilayer Perceptron learning.
HHO-MLP aims to select optimal parameters in its learning process to minimize intrusion detection errors in networks.
HHO-MLP showed superior performance by attaining top scores with accuracy rate of 93.17%, sensitivity level of 95.41%, and specificity percentage of 95.41%.
arXiv Detail & Related papers (2024-02-21T06:25:50Z) - Compact NSGA-II for Multi-objective Feature Selection [0.24578723416255746]
We define feature selection as a multi-objective binary optimization task with the objectives of maximizing classification accuracy and minimizing the number of selected features.
In order to select optimal features, we have proposed a binary Compact NSGA-II (CNSGA-II) algorithm.
To the best of our knowledge, this is the first compact multi-objective algorithm proposed for feature selection.
arXiv Detail & Related papers (2024-02-20T01:10:12Z) - Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO)
Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z) - Feature selection algorithm based on incremental mutual information and
cockroach swarm optimization [12.297966427336124]
We propose an incremental mutual information based improved swarm intelligent optimization method (IMIICSO)
This method extracts decision table reduction knowledge to guide group algorithm global search.
The accuracy of feature subsets selected by the improved cockroach swarm algorithm based on incremental mutual information is better or almost the same as that of the original swarm intelligent optimization algorithm.
arXiv Detail & Related papers (2023-02-21T08:51:05Z) - An efficient hybrid classification approach for COVID-19 based on Harris
Hawks Optimization and Salp Swarm Optimization [0.0]
This study presents a hybrid binary version of the Harris Hawks Optimization algorithm (HHO) and Salp Swarm Optimization (SSA) for Covid-19 classification.
The proposed algorithm (HHOSSA) achieved 96% accuracy with the SVM, 98% and 98% accuracy with two classifiers.
arXiv Detail & Related papers (2022-12-25T19:52:18Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - A Tent L\'evy Flying Sparrow Search Algorithm for Feature Selection: A
COVID-19 Case Study [1.6436293069942312]
The "Curse of Dimensionality" induced by the rapid development of information science might have a negative impact when dealing with big datasets.
We propose a variant of the sparrow search algorithm (SSA), called Tent L'evy flying sparrow search algorithm (TFSSA)
TFSSA is used to select the best subset of features in the packing pattern for classification purposes.
arXiv Detail & Related papers (2022-09-20T15:12:10Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - RSO: A Novel Reinforced Swarm Optimization Algorithm for Feature
Selection [0.0]
In this paper, we propose a novel feature selection algorithm named Reinforced Swarm Optimization (RSO)
This algorithm embeds the widely used Bee Swarm Optimization (BSO) algorithm along with Reinforcement Learning (RL) to maximize the reward of a superior search agent and punish the inferior ones.
The proposed method is evaluated on 25 widely known UCI datasets containing a perfect blend of balanced and imbalanced data.
arXiv Detail & Related papers (2021-07-29T17:38:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.