Related papers: On-the-Fly Joint Feature Selection and Classification

On-the-Fly Joint Feature Selection and Classification

URL: http://arxiv.org/abs/2004.10245v1
Date: Tue, 21 Apr 2020 19:19:39 GMT
Title: On-the-Fly Joint Feature Selection and Classification
Authors: Yasitha Warahena Liyanage, Daphney-Stavroula Zois, Charalampos Chelmis
Abstract summary: We propose a framework to perform joint feature selection and classification on-the-fly. We derive the optimum solution of the associated optimization problem and analyze its structure. We evaluate the performance of the proposed algorithms on several public datasets.
Score: 16.84451472788859
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Joint feature selection and classification in an online setting is essential for time-sensitive decision making. However, most existing methods treat this coupled problem independently. Specifically, online feature selection methods can handle either streaming features or data instances offline to produce a fixed set of features for classification, while online classification methods classify incoming instances using full knowledge about the feature space. Nevertheless, all existing methods utilize a set of features, common for all data instances, for classification. Instead, we propose a framework to perform joint feature selection and classification on-the-fly, so as to minimize the number of features evaluated for every data instance and maximize classification accuracy. We derive the optimum solution of the associated optimization problem and analyze its structure. Two algorithms are proposed, ETANA and F-ETANA, which are based on the optimum solution and its properties. We evaluate the performance of the proposed algorithms on several public datasets, demonstrating (i) the dominance of the proposed algorithms over the state-of-the-art, and (ii) its applicability to broad range of application domains including clinical research and natural language processing.

Related papers

An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z)
Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency. We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z)
An Evolutionary Correlation-aware Feature Selection Method for Classification Problems [3.2550305883611244]
In this paper, an estimation of distribution algorithm is proposed to meet three goals. Firstly, as an extension of EDA, the proposed method generates only two individuals in each iteration that compete based on a fitness function. Secondly, we provide a guiding technique for determining the number of features for individuals in each iteration. As the main contribution of the paper, in addition to considering the importance of each feature alone, the proposed method can consider the interaction between features.
arXiv Detail & Related papers (2021-10-16T20:20:43Z)
Dynamic Instance-Wise Classification in Correlated Feature Spaces [15.351282873821935]
In a typical machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training. A new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy. The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
arXiv Detail & Related papers (2021-06-08T20:20:36Z)
Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method. A subset of optimal features will be selected in group, and the number of selected features will be determined automatically. Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z)
Feature Selection Methods for Cost-Constrained Classification in Random Forests [3.4806267677524896]
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model. Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees. We propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures.
arXiv Detail & Related papers (2020-08-14T11:39:52Z)
A novel embedded min-max approach for feature selection in nonlinear support vector machine classification [0.0]
We propose an embedded feature selection method based on a min-max optimization problem. By leveraging duality theory, we equivalently reformulate the min-max problem and solve it without further ado. The efficiency and usefulness of our approach are tested on several benchmark data sets.
arXiv Detail & Related papers (2020-04-21T09:40:38Z)
Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms. We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.