Related papers: GFSNetwork: Differentiable Feature Selection via Gumbel-Sigmoid Relaxation

GFSNetwork: Differentiable Feature Selection via Gumbel-Sigmoid Relaxation

URL: http://arxiv.org/abs/2503.13304v1
Date: Mon, 17 Mar 2025 15:47:26 GMT
Title: GFSNetwork: Differentiable Feature Selection via Gumbel-Sigmoid Relaxation
Authors: Witold Wydmański, Marek Śmieja,
Abstract summary: We present GFSNetwork, a novel neural architecture that performs differentiable feature selection through Gumbel-Sigmoid sampling.<n>We evaluate GFSNetwork on a series of classification and regression benchmarks, where it consistently outperforms recent methods.<n>We validate our approach on real-world metagenomic datasets, demonstrating its effectiveness in high-dimensional biological data.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature selection in deep learning remains a critical challenge, particularly for high-dimensional tabular data where interpretability and computational efficiency are paramount. We present GFSNetwork, a novel neural architecture that performs differentiable feature selection through temperature-controlled Gumbel-Sigmoid sampling. Unlike traditional methods, where the user has to define the requested number of features, GFSNetwork selects it automatically during an end-to-end process. Moreover, GFSNetwork maintains constant computational overhead regardless of the number of input features. We evaluate GFSNetwork on a series of classification and regression benchmarks, where it consistently outperforms recent methods including DeepLasso, attention maps, as well as traditional feature selectors, while using significantly fewer features. Furthermore, we validate our approach on real-world metagenomic datasets, demonstrating its effectiveness in high-dimensional biological data. Concluding, our method provides a scalable solution that bridges the gap between neural network flexibility and traditional feature selection interpretability. We share our python implementation of GFSNetwork at https://github.com/wwydmanski/GFSNetwork, as well as a PyPi package (gfs_network).

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction. Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z)
GNN-LoFI: a Novel Graph Neural Network through Localized Feature-based Histogram Intersection [51.608147732998994]
Graph neural networks are increasingly becoming the framework of choice for graph-based machine learning. We propose a new graph neural network architecture that substitutes classical message passing with an analysis of the local distribution of node features.
arXiv Detail & Related papers (2024-01-17T13:04:23Z)
Joint Feature and Differentiable $ k $-NN Graph Learning using Dirichlet Energy [103.74640329539389]
We propose a deep FS method that simultaneously conducts feature selection and differentiable $ k $-NN graph learning. We employ Optimal Transport theory to address the non-differentiability issue of learning $ k $-NN graphs in neural networks. We validate the effectiveness of our model with extensive experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-21T08:15:55Z)
Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks [17.12834153477201]
We propose a novel resource-efficient supervised feature selection method using sparse neural networks. By gradually pruning the uninformative features from the input layer of a sparse neural network trained from scratch, NeuroFS derives an informative subset of features efficiently. NeuroFS achieves the highest ranking-based score among the considered state-of-the-art supervised feature selection models.
arXiv Detail & Related papers (2023-03-10T17:09:55Z)
Simplifying approach to Node Classification in Graph Neural Networks [7.057970273958933]
We decouple the node feature aggregation step and depth of graph neural network, and empirically analyze how different aggregated features play a role in prediction performance. We show that not all features generated via aggregation steps are useful, and often using these less informative features can be detrimental to the performance of the GNN model. We present a simple and shallow model, Feature Selection Graph Neural Network (FSGNN), and show empirically that the proposed model achieves comparable or even higher accuracy than state-of-the-art GNN models.
arXiv Detail & Related papers (2021-11-12T14:53:22Z)
A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning [23.83339228535986]
Various types of neural networks have been introduced to deal with different types of problems. The main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features. The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish.
arXiv Detail & Related papers (2021-09-29T16:41:19Z)
SALA: Soft Assignment Local Aggregation for Parameter Efficient 3D Semantic Segmentation [65.96170587706148]
We focus on designing a point local aggregation function that yields parameter efficient networks for 3D point cloud semantic segmentation. We explore the idea of using learnable neighbor-to-grid soft assignment in grid-based aggregation functions.
arXiv Detail & Related papers (2020-12-29T20:16:37Z)
Feature Selection Based on Sparse Neural Network Layer with Normalizing Constraints [0.0]
We propose new neural-network based feature selection approach that introduces two constrains, the satisfying of which leads to sparse FS layer. The results confirm that proposed Feature Selection Based on Sparse Neural Network Layer with Normalizing Constraints (SNEL-FS) is able to select the important features and yields superior performance compared to other conventional FS methods.
arXiv Detail & Related papers (2020-12-11T14:14:33Z)
Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths. Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z)
Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels. We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data. RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.