Related papers: Fast Classification with Sequential Feature Selection in Test Phase

Fast Classification with Sequential Feature Selection in Test Phase

URL: http://arxiv.org/abs/2306.14347v1
Date: Sun, 25 Jun 2023 21:31:46 GMT
Title: Fast Classification with Sequential Feature Selection in Test Phase
Authors: Ali Mirzaei, Vahid Pourahmadi, Hamid Sheikhzadeh, Alireza Abdollahpourrostam
Abstract summary: This paper introduces a novel approach to active feature acquisition for classification. It is the task of sequentially selecting the most informative subset of features to achieve optimal prediction performance. The proposed approach involves a new lazy model that is significantly faster and more efficient compared to existing methods.
Score: 1.1470070927586016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel approach to active feature acquisition for classification, which is the task of sequentially selecting the most informative subset of features to achieve optimal prediction performance during testing while minimizing cost. The proposed approach involves a new lazy model that is significantly faster and more efficient compared to existing methods, while still producing comparable accuracy results. During the test phase, the proposed approach utilizes Fisher scores for feature ranking to identify the most important feature at each step. In the next step the training dataset is filtered based on the observed value of the selected feature and then we continue this process to reach to acceptable accuracy or limit of the budget for feature acquisition. The performance of the proposed approach was evaluated on synthetic and real datasets, including our new synthetic dataset, CUBE dataset and also real dataset Forest. The experimental results demonstrate that our approach achieves competitive accuracy results compared to existing methods, while significantly outperforming them in terms of speed. The source code of the algorithm is released at github with this link: https://github.com/alimirzaei/FCwSFS.

Related papers

Unifying and Optimizing Data Values for Selection via Sequential-Decision-Making [5.755427480127593]
We show that data values applied for selection can be reformulated as a sequential-decision-making problem. We propose an efficient approximation scheme using learned bipartite graphs as surrogate utility models.
arXiv Detail & Related papers (2025-02-06T23:03:10Z)
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models [36.22392593103493]
Data selection for fine-tuning large language models (LLMs) aims to choose a high-quality subset from existing datasets. Existing surveys overlook an in-depth exploration of the fine-tuning phase. We introduce a novel three-stage scheme - comprising feature extraction, criteria design, and selector evaluation - to systematically categorize and evaluate these methods.
arXiv Detail & Related papers (2024-06-20T08:58:58Z)
Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
A Tent L\'evy Flying Sparrow Search Algorithm for Feature Selection: A COVID-19 Case Study [1.6436293069942312]
The "Curse of Dimensionality" induced by the rapid development of information science might have a negative impact when dealing with big datasets. We propose a variant of the sparrow search algorithm (SSA), called Tent L'evy flying sparrow search algorithm (TFSSA) TFSSA is used to select the best subset of features in the packing pattern for classification purposes.
arXiv Detail & Related papers (2022-09-20T15:12:10Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
A concise method for feature selection via normalized frequencies [0.0]
In this paper, a concise method is proposed for universal feature selection. The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them. The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
arXiv Detail & Related papers (2021-06-10T15:29:54Z)
RFCBF: enhance the performance and stability of Fast Correlation-Based Filter [6.781877756322586]
We propose a novel extension of FCBF, called RFCBF, which combines resampling technique to improve classification accuracy. The experimental results show that the RFCBF algorithm yields significantly better results than previous state-of-the-art methods in terms of classification accuracy and runtime.
arXiv Detail & Related papers (2021-05-30T12:36:32Z)
AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective. We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z)
Fast Template Matching and Update for Video Object Tracking and Segmentation [56.465510428878]
The main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames. The challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template. We propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time.
arXiv Detail & Related papers (2020-04-16T08:58:45Z)
IVFS: Simple and Efficient Feature Selection for High Dimensional Topology Preservation [33.424663018395684]
We propose a simple and effective feature selection algorithm to enhance sample similarity preservation. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data.
arXiv Detail & Related papers (2020-04-02T23:05:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.