Graph Convolutional Network-based Feature Selection for High-dimensional
and Low-sample Size Data
- URL: http://arxiv.org/abs/2211.14144v1
- Date: Fri, 25 Nov 2022 14:46:36 GMT
- Title: Graph Convolutional Network-based Feature Selection for High-dimensional
and Low-sample Size Data
- Authors: Can Chen, Scott T. Weiss, Yang-Yu Liu
- Abstract summary: We present a deep learning-based method - GRAph Convolutional nEtwork feature Selector (GRACES) - to select important features for HDLSS data.
We demonstrate empirical evidence that GRACES outperforms other feature selection methods on both synthetic and real-world datasets.
- Score: 4.266990593059533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection is a powerful dimension reduction technique which selects a
subset of relevant features for model construction. Numerous feature selection
methods have been proposed, but most of them fail under the high-dimensional
and low-sample size (HDLSS) setting due to the challenge of overfitting. In
this paper, we present a deep learning-based method - GRAph Convolutional
nEtwork feature Selector (GRACES) - to select important features for HDLSS
data. We demonstrate empirical evidence that GRACES outperforms other feature
selection methods on both synthetic and real-world datasets.
Related papers
- Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection.
We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - ShaRP: Shape-Regularized Multidimensional Projections [71.30697308446064]
We present a novel projection technique - ShaRP - that provides users explicit control over the visual signature of the created scatterplot.
ShaRP scales well with dimensionality and dataset size, and generically handles any quantitative dataset.
arXiv Detail & Related papers (2023-06-01T11:16:58Z) - Graph-based Extreme Feature Selection for Multi-class Classification
Tasks [7.863638253070439]
This work focuses on a graph-based, filter feature selection method that is suited for multi-class classifications tasks.
We aim to drastically reduce the number of selected features, in order to create a sketch of the original data that codes valuable information for the classification task.
arXiv Detail & Related papers (2023-03-03T09:06:35Z) - On Supervised Feature Selection from High Dimensional Feature Spaces [33.22006437399753]
We propose a novel supervised feature selection methodology for machine learning decisions.
Tests are called the discriminant feature test (DFT) and the relevant feature test (RFT) for the classification and regression problems.
It is shown that DFT and RFT can select a lower dimensional feature subspace distinctly and robustly while maintaining high decision performance.
arXiv Detail & Related papers (2022-03-22T17:52:18Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Sparse Centroid-Encoder: A Nonlinear Model for Feature Selection [1.2487990897680423]
We develop a sparse implementation of the centroid-encoder for nonlinear data reduction and visualization called Centro Sparseid-Encoder.
We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set.
The algorithm is applied to a wide variety of data sets including, single-cell biological data, high dimensional infectious disease data, hyperspectral data, image data, and speech data.
arXiv Detail & Related papers (2022-01-30T20:46:24Z) - Feature Selection Based on Sparse Neural Network Layer with Normalizing
Constraints [0.0]
We propose new neural-network based feature selection approach that introduces two constrains, the satisfying of which leads to sparse FS layer.
The results confirm that proposed Feature Selection Based on Sparse Neural Network Layer with Normalizing Constraints (SNEL-FS) is able to select the important features and yields superior performance compared to other conventional FS methods.
arXiv Detail & Related papers (2020-12-11T14:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.