On-the-Fly Joint Feature Selection and Classification
- URL: http://arxiv.org/abs/2004.10245v1
- Date: Tue, 21 Apr 2020 19:19:39 GMT
- Title: On-the-Fly Joint Feature Selection and Classification
- Authors: Yasitha Warahena Liyanage, Daphney-Stavroula Zois, Charalampos Chelmis
- Abstract summary: We propose a framework to perform joint feature selection and classification on-the-fly.
We derive the optimum solution of the associated optimization problem and analyze its structure.
We evaluate the performance of the proposed algorithms on several public datasets.
- Score: 16.84451472788859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Joint feature selection and classification in an online setting is essential
for time-sensitive decision making. However, most existing methods treat this
coupled problem independently. Specifically, online feature selection methods
can handle either streaming features or data instances offline to produce a
fixed set of features for classification, while online classification methods
classify incoming instances using full knowledge about the feature space.
Nevertheless, all existing methods utilize a set of features, common for all
data instances, for classification. Instead, we propose a framework to perform
joint feature selection and classification on-the-fly, so as to minimize the
number of features evaluated for every data instance and maximize
classification accuracy. We derive the optimum solution of the associated
optimization problem and analyze its structure. Two algorithms are proposed,
ETANA and F-ETANA, which are based on the optimum solution and its properties.
We evaluate the performance of the proposed algorithms on several public
datasets, demonstrating (i) the dominance of the proposed algorithms over the
state-of-the-art, and (ii) its applicability to broad range of application
domains including clinical research and natural language processing.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - An Evolutionary Correlation-aware Feature Selection Method for
Classification Problems [3.2550305883611244]
In this paper, an estimation of distribution algorithm is proposed to meet three goals.
Firstly, as an extension of EDA, the proposed method generates only two individuals in each iteration that compete based on a fitness function.
Secondly, we provide a guiding technique for determining the number of features for individuals in each iteration.
As the main contribution of the paper, in addition to considering the importance of each feature alone, the proposed method can consider the interaction between features.
arXiv Detail & Related papers (2021-10-16T20:20:43Z) - Dynamic Instance-Wise Classification in Correlated Feature Spaces [15.351282873821935]
In a typical machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training.
A new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy.
The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
arXiv Detail & Related papers (2021-06-08T20:20:36Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - Feature Selection Methods for Cost-Constrained Classification in Random
Forests [3.4806267677524896]
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model.
Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees.
We propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures.
arXiv Detail & Related papers (2020-08-14T11:39:52Z) - A novel embedded min-max approach for feature selection in nonlinear
support vector machine classification [0.0]
We propose an embedded feature selection method based on a min-max optimization problem.
By leveraging duality theory, we equivalently reformulate the min-max problem and solve it without further ado.
The efficiency and usefulness of our approach are tested on several benchmark data sets.
arXiv Detail & Related papers (2020-04-21T09:40:38Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.