Automated Supervised Feature Selection for Differentiated Patterns of
Care
- URL: http://arxiv.org/abs/2111.03495v1
- Date: Fri, 5 Nov 2021 13:27:18 GMT
- Title: Automated Supervised Feature Selection for Differentiated Patterns of
Care
- Authors: Catherine Wanjiru, William Ogallo, Girmaw Abebe Tadesse, Charles
Wachira, Isaiah Onando Mulang', Aisha Walcott-Bryant
- Abstract summary: The pipeline included three types of feature selection techniques; Filters, Wrappers and Embedded methods to select the top K features.
The selected features were tested in the existing multi-dimensional subset scanning (MDSS) where the most anomalous subpopulations, most anomalous subsets, propensity scores, and effect of measures were recorded to test their performance.
- Score: 5.3825788156200565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An automated feature selection pipeline was developed using several
state-of-the-art feature selection techniques to select optimal features for
Differentiating Patterns of Care (DPOC). The pipeline included three types of
feature selection techniques; Filters, Wrappers and Embedded methods to select
the top K features. Five different datasets with binary dependent variables
were used and their different top K optimal features selected. The selected
features were tested in the existing multi-dimensional subset scanning (MDSS)
where the most anomalous subpopulations, most anomalous subsets, propensity
scores, and effect of measures were recorded to test their performance. This
performance was compared with four similar metrics gained after using all
covariates in the dataset in the MDSS pipeline. We found out that despite the
different feature selection techniques used, the data distribution is key to
note when determining the technique to use.
Related papers
- Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Graph-Based Automatic Feature Selection for Multi-Class Classification
via Mean Simplified Silhouette [4.786337974720721]
This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS)
The method determines the minimum combination of features required to sustain prediction performance.
It does not require any user-defined parameters such as the number of features to select.
arXiv Detail & Related papers (2023-09-05T14:37:31Z) - Model-free feature selection to facilitate automatic discovery of
divergent subgroups in tabular data [4.551615447454768]
We propose a model-free and sparsity-based automatic feature selection (SAFS) framework to facilitate automatic discovery of divergent subgroups.
We validated SAFS across two publicly available datasets (MIMIC-III and Allstate Claims) and compared it with six existing feature selection methods.
arXiv Detail & Related papers (2022-03-08T20:42:56Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Correlation Based Feature Subset Selection for Multivariate Time-Series
Data [2.055949720959582]
Correlations in streams of time series data mean that only a small subset of the features are required for a given data mining task.
We propose a technique which does feature subset selection based on the correlation patterns of single feature classifier outputs.
arXiv Detail & Related papers (2021-11-26T17:39:33Z) - A Supervised Feature Selection Method For Mixed-Type Data using
Density-based Feature Clustering [1.3048920509133808]
This paper proposes a supervised feature selection method using density-based feature clustering (SFSDFC)
SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method.
Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters.
arXiv Detail & Related papers (2021-11-10T15:05:15Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - Supervised Feature Subset Selection and Feature Ranking for Multivariate
Time Series without Feature Extraction [78.84356269545157]
We introduce supervised feature ranking and feature subset selection algorithms for MTS classification.
Unlike most existing supervised/unsupervised feature selection algorithms for MTS our techniques do not require a feature extraction step to generate a one-dimensional feature vector from the time series.
arXiv Detail & Related papers (2020-05-01T07:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.