Related papers: Elastic Net based Feature Ranking and Selection

Elastic Net based Feature Ranking and Selection

URL: http://arxiv.org/abs/2012.14982v1
Date: Wed, 30 Dec 2020 00:08:36 GMT
Title: Elastic Net based Feature Ranking and Selection
Authors: Shaode Yu, Haobo Chen, Hang Yu, Zhicheng Zhang, Xiaokun Liang, Wenjian Qin, Yaoqin Xie, Ping Shi
Abstract summary: An intuitive idea is put at the end of multiple times of data splitting and elastic net based feature selection. It concerns the frequency of selected features and uses the frequency as an indicator of feature importance. It achieves competitive or superior performance to elastic net and with consistent selection of fewer features.
Score: 9.289190508925875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature selection is important in data representation and intelligent diagnosis. Elastic net is one of the most widely used feature selectors. However, the features selected are dependant on the training data, and their weights dedicated for regularized regression are irrelevant to their importance if used for feature ranking, that degrades the model interpretability and extension. In this study, an intuitive idea is put at the end of multiple times of data splitting and elastic net based feature selection. It concerns the frequency of selected features and uses the frequency as an indicator of feature importance. After features are sorted according to their frequency, linear support vector machine performs the classification in an incremental manner. At last, a compact subset of discriminative features is selected by comparing the prediction performance. Experimental results on breast cancer data sets (BCDR-F03, WDBC, GSE 10810, and GSE 15852) suggest that the proposed framework achieves competitive or superior performance to elastic net and with consistent selection of fewer features. How to further enhance its consistency on high-dimension small-sample-size data sets should be paid more attention in our future work. The proposed framework is accessible online (https://github.com/NicoYuCN/elasticnetFR).

Related papers

On the (In)Significance of Feature Selection in High-Dimensional Datasets [5.552060522805327]
Small random subsets of features (0.02-1%) match or outperform the predictive performance of both full feature sets and FS.<n>Results challenge the assumption that computationally selected features reliably capture meaningful signals.
arXiv Detail & Related papers (2025-08-05T15:58:31Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z)
Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures. We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z)
Online Sparse Streaming Feature Selection Using Adapted Classification [5.587715545506331]
Existing methods divide features into relevance or irrelevance without missing data. We propose online sparse streaming feature selection based on adapted classification (OS2FS-AC) Experimental results on ten real-world data sets demonstrate that OS2FS-AC performs better than state-of-the-art algo-rithms.
arXiv Detail & Related papers (2023-02-25T03:03:53Z)
Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution. We extend the classic importance resampling approach used in low-dimensions for LM data selection. We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z)
Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST) Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness. The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Dynamic Instance-Wise Classification in Correlated Feature Spaces [15.351282873821935]
In a typical machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training. A new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy. The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
arXiv Detail & Related papers (2021-06-08T20:20:36Z)
A User-Guided Bayesian Framework for Ensemble Feature Selection in Life Science Applications (UBayFS) [0.0]
We propose UBayFS, an ensemble feature selection technique, embedded in a Bayesian statistical framework. Our approach enhances the feature selection process by considering two sources of information: data and domain knowledge. A comparison with standard feature selectors underlines that UBayFS achieves competitive performance, while providing additional flexibility to incorporate domain knowledge.
arXiv Detail & Related papers (2021-04-30T06:51:33Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
RENT -- Repeated Elastic Net Technique for Feature Selection [0.46180371154032895]
We present the Repeated Elastic Net Technique (RENT) for Feature Selection. RENT uses an ensemble of generalized linear models with elastic net regularization, each trained on distinct subsets of the training data. RENT provides valuable information for model interpretation concerning the identification of objects in the data that are difficult to predict during training.
arXiv Detail & Related papers (2020-09-27T07:55:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.