Automated Supervised Feature Selection for Differentiated Patterns of
Care
- URL: http://arxiv.org/abs/2111.03495v1
- Date: Fri, 5 Nov 2021 13:27:18 GMT
- Title: Automated Supervised Feature Selection for Differentiated Patterns of
Care
- Authors: Catherine Wanjiru, William Ogallo, Girmaw Abebe Tadesse, Charles
Wachira, Isaiah Onando Mulang', Aisha Walcott-Bryant
- Abstract summary: The pipeline included three types of feature selection techniques; Filters, Wrappers and Embedded methods to select the top K features.
The selected features were tested in the existing multi-dimensional subset scanning (MDSS) where the most anomalous subpopulations, most anomalous subsets, propensity scores, and effect of measures were recorded to test their performance.
- Score: 5.3825788156200565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An automated feature selection pipeline was developed using several
state-of-the-art feature selection techniques to select optimal features for
Differentiating Patterns of Care (DPOC). The pipeline included three types of
feature selection techniques; Filters, Wrappers and Embedded methods to select
the top K features. Five different datasets with binary dependent variables
were used and their different top K optimal features selected. The selected
features were tested in the existing multi-dimensional subset scanning (MDSS)
where the most anomalous subpopulations, most anomalous subsets, propensity
scores, and effect of measures were recorded to test their performance. This
performance was compared with four similar metrics gained after using all
covariates in the dataset in the MDSS pipeline. We found out that despite the
different feature selection techniques used, the data distribution is key to
note when determining the technique to use.
Related papers
- Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking [5.840228332438659]
This paper proposes an unsupervised feature selection algorithm based on dual manifold re-ranking (DMRR)
Different similarity matrices are constructed to depict the manifold structures among samples, between samples and features, and among features themselves.
By comparing DMRR with three original unsupervised feature selection algorithms and two unsupervised feature selection post-processing algorithms, experimental results confirm that the importance information of different samples and the dual relationship between sample and feature are beneficial for achieving better feature selection.
arXiv Detail & Related papers (2024-10-27T09:29:17Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Model-free feature selection to facilitate automatic discovery of
divergent subgroups in tabular data [4.551615447454768]
We propose a model-free and sparsity-based automatic feature selection (SAFS) framework to facilitate automatic discovery of divergent subgroups.
We validated SAFS across two publicly available datasets (MIMIC-III and Allstate Claims) and compared it with six existing feature selection methods.
arXiv Detail & Related papers (2022-03-08T20:42:56Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Correlation Based Feature Subset Selection for Multivariate Time-Series
Data [2.055949720959582]
Correlations in streams of time series data mean that only a small subset of the features are required for a given data mining task.
We propose a technique which does feature subset selection based on the correlation patterns of single feature classifier outputs.
arXiv Detail & Related papers (2021-11-26T17:39:33Z) - A Supervised Feature Selection Method For Mixed-Type Data using
Density-based Feature Clustering [1.3048920509133808]
This paper proposes a supervised feature selection method using density-based feature clustering (SFSDFC)
SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method.
Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters.
arXiv Detail & Related papers (2021-11-10T15:05:15Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - Supervised Feature Subset Selection and Feature Ranking for Multivariate
Time Series without Feature Extraction [78.84356269545157]
We introduce supervised feature ranking and feature subset selection algorithms for MTS classification.
Unlike most existing supervised/unsupervised feature selection algorithms for MTS our techniques do not require a feature extraction step to generate a one-dimensional feature vector from the time series.
arXiv Detail & Related papers (2020-05-01T07:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.