Cost-constrained multi-label group feature selection using shadow features
- URL: http://arxiv.org/abs/2408.01851v1
- Date: Sat, 3 Aug 2024 19:31:59 GMT
- Title: Cost-constrained multi-label group feature selection using shadow features
- Authors: Tomasz Klonecki, Paweł Teisseyre, Jaesung Lee,
- Abstract summary: We consider the problem of feature selection in multi-label classification, considering the costs assigned to groups of features.
In this task, the goal is to select a subset of features that will be useful for predicting the label vector, but at the same time, the cost associated with the selected features will not exceed the assumed budget.
- Score: 1.87071394890391
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We consider the problem of feature selection in multi-label classification, considering the costs assigned to groups of features. In this task, the goal is to select a subset of features that will be useful for predicting the label vector, but at the same time, the cost associated with the selected features will not exceed the assumed budget. Solving the problem is of great importance in medicine, where we may be interested in predicting various diseases based on groups of features. The groups may be associated with parameters obtained from a certain diagnostic test, such as a blood test. Because diagnostic test costs can be very high, considering cost information when selecting relevant features becomes crucial to reducing the cost of making predictions. We focus on the feature selection method based on information theory. The proposed method consists of two steps. First, we select features sequentially while maximizing conditional mutual information until the budget is exhausted. In the second step, we select additional cost-free features, i.e., those coming from groups that have already been used in previous steps. Limiting the number of added features is possible using the stop rule based on the concept of so-called shadow features, which are randomized counterparts of the original ones. In contrast to existing approaches based on penalized criteria, in our method, we avoid the need for computationally demanding optimization of the penalty parameter. Experiments conducted on the MIMIC medical database show the effectiveness of the method, especially when the assumed budget is limited.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs)
This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias"
We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z) - Neural Greedy Pursuit for Feature Selection [72.4121881681861]
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem.
We use neural networks as predictors in the algorithm to compute the loss.
arXiv Detail & Related papers (2022-07-19T16:39:16Z) - Ensemble feature selection with data-driven thresholding for Alzheimer's
disease biomarker discovery [0.0]
This work develops several data-driven thresholds to automatically identify the relevant features in an ensemble feature selector.
To demonstrate the applicability of these methods to clinical data, they are applied to data from two real-world Alzheimer's disease (AD) studies.
arXiv Detail & Related papers (2022-07-05T05:50:51Z) - Budgeted Classification with Rejection: An Evolutionary Method with
Multiple Objectives [0.0]
Budgeted, sequential classifiers (BSCs) process inputs through a sequence of partial feature acquisition and evaluation steps.
This allows for an efficient evaluation of inputs that prevents unneeded feature acquisition.
We propose a problem-specific genetic algorithm to build budgeted, sequential classifiers with confidence-based reject options.
arXiv Detail & Related papers (2022-05-01T22:05:16Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions [74.00030431081751]
We formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users.
Our method satisfies up to 25.89 percentage points more users compared to strong baseline methods.
arXiv Detail & Related papers (2021-11-01T19:49:35Z) - Active Feature Selection for the Mutual Information Criterion [21.376800678915558]
We study active feature selection, a novel feature selection setting in which unlabeled data is available.
We focus on feature selection using the classical mutual information criterion, which selects the $k$ features with the largest mutual information with the label.
In the active feature selection setting, the goal is to use significantly fewer labels than the data set size and still find $k$ features whose mutual information with the label based on the emphentire data set is large.
arXiv Detail & Related papers (2020-12-13T06:40:35Z) - Feature Selection Methods for Cost-Constrained Classification in Random
Forests [3.4806267677524896]
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model.
Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees.
We propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures.
arXiv Detail & Related papers (2020-08-14T11:39:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.