Related papers: Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction

Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction

URL: http://arxiv.org/abs/2509.23366v1
Date: Sat, 27 Sep 2025 15:24:17 GMT
Title: Splines-Based Feature Importance in Kolmogorov-Arnold Networks: A Framework for Supervised Tabular Data Dimensionality Reduction
Authors: Ange-Clément Akazan, Verlon Roel Mbingui,
Abstract summary: We introduce four KAN-based selectors ($textitKAN-L1$, $textitKAN-L2$, $textitKAN-SI$, $textitKAN-KO$) and compare them against classical baselines.<n>F1 scores and $R2$ score results reveal that KAN-based selectors, particularly $textitKAN-L2$, $textitKAN-L1$, $textitKAN-SI$, and $textitKAN
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: High-dimensional datasets require effective feature selection to improve predictive performance, interpretability, and robustness. We propose and evaluate feature selection methods for tabular datasets based on Kolmogorov-Arnold networks (KANs), which parameterize feature transformations through splines, enabling direct access to interpretable importance measures. We introduce four KAN-based selectors ($\textit{KAN-L1}$, $\textit{KAN-L2}$, $\textit{KAN-SI}$, $\textit{KAN-KO}$) and compare them against classical baselines (LASSO, Random Forest, Mutual Information, SVM-RFE) across multiple classification and regression tabular dataset benchmarks. Average (over three retention levels: 20\%, 40\%, and 60\%) F1 scores and $R^2$ score results reveal that KAN-based selectors, particularly $\textit{KAN-L2}$, $\textit{KAN-L1}$, $\textit{KAN-SI}$, and $\textit{KAN-KO}$, are competitive with and sometimes superior to classical baselines in structured and synthetic datasets. However, $\textit{KAN-L1}$ is often too aggressive in regression, removing useful features, while $\textit{KAN-L2}$ underperforms in classification, where simple coefficient shrinkage misses complex feature interactions. $\textit{KAN-L2}$ and $\textit{KAN-SI}$ provide robust performance on noisy regression datasets and heterogeneous datasets, aligning closely with ensemble predictors. In classification tasks, KAN selectors such as $\textit{KAN-L1}$, $\textit{KAN-KO}$, and $\textit{KAN-SI}$ sometimes surpass the other selectors by eliminating redundancy, particularly in high-dimensional multi-class data. Overall, our findings demonstrate that KAN-based feature selection provides a powerful and interpretable alternative to traditional methods, capable of uncovering nonlinear and multivariate feature relevance beyond sparsity or impurity-based measures.

Related papers

Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers [16.572159435616456]
$mathrmPerAugy$ is a novel cross-trajectory shuffling and summary-content perturbation technique.<n>We show that it significantly boosts the accuracy of four state-of-the-art baseline (SOTA) user-encoders commonly used in personalized summarization frameworks.<n>As a post-hoc analysis of the role of induced diversity in the augmented dataset by peraugy, we introduce three dataset diversity metrics.
arXiv Detail & Related papers (2025-10-11T07:40:20Z)
Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z)
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training [53.07879717463279]
textscDomain2Vec decomposes any dataset into a linear combination of several emphmeta-domains<n>textscDomain2Vec helps find the data mixture that enhances downstream task performance with minimal computational overhead.
arXiv Detail & Related papers (2025-06-12T17:53:51Z)
Data Selection for ERMs [67.57726352698933]
We study how well can $mathcalA$ perform when trained on at most $nll N$ data points selected from a population of $N$ points.<n>Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression.
arXiv Detail & Related papers (2025-04-20T11:26:01Z)
Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization [6.342485512772862]
Unsupervised single feature selection (UFS) is widely applied in machine learning and pattern recognition.<n>Most of the existing methods only consider sparsity, which makes it difficult to select subsets and discriminative from the original.<n>In this paper, we propose a new method called DSCOFS to consider a subset and discriminative from the original.
arXiv Detail & Related papers (2025-01-01T05:05:46Z)
Incorporating Arbitrary Matrix Group Equivariance into KANs [69.30866522377694]
Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains.<n>We propose Equivariant Kolmogorov-Arnold Networks (EKAN) to broaden their applicability to more fields.
arXiv Detail & Related papers (2024-10-01T06:34:58Z)
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification [53.89380284760555]
textttFOCI (textbfFine-grained textbfObject textbfClasstextbfIfication) is a difficult multiple-choice benchmark for fine-grained object classification. textttFOCIxspace complements five popular classification datasets with four domain-specific subsets from ImageNet-21k.
arXiv Detail & Related papers (2024-06-20T16:59:39Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond [28.651041302245538]
We present a new data selection approach based on $k$-means clustering and sampling sensitivity. We show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performances of leverage score sampling.
arXiv Detail & Related papers (2024-02-27T09:03:43Z)
Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning [17.40655778450583]
We propose a principled metric named Variance Alignment Score (VAS), which has the form $langle Sigma_texttest, Sigma_irangle$. We show that applying VAS and CLIP scores together can outperform baselines by a margin of $1.3%$ on 38 evaluation sets for noisy dataset DataComp and $2.5%$ on VTAB for high-quality dataset CC12M.
arXiv Detail & Related papers (2024-02-03T06:29:04Z)
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z)
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features. Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z)
You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model. We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z)
Classification of high-dimensional data with spiked covariance matrix structure [0.5156484100374059]
We study the classification problem for high-dimensional data with $n$ observations on $p$ features.<n>We propose an adaptive classifier that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space.<n>We show that the resulting classifier is Bayes optimal whenever $n rightarrow infty$ and $s sqrtn-1 ln p rightarrow 0$.
arXiv Detail & Related papers (2021-10-05T11:26:53Z)
On the Adversarial Robustness of LASSO Based Feature Selection [72.54211869067979]
In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix. We formulate the modification strategy of the adversary as a bi-level optimization problem. Numerical examples with synthetic and real data illustrate that our method is efficient and effective.
arXiv Detail & Related papers (2020-10-20T05:51:26Z)
Imbalance Learning for Variable Star Classification [0.0]
We develop a hierarchical machine learning classification scheme to overcome imbalanced learning problems. We use 'data-level' approaches to directly augment the training data so that they better describe under-represented classes. We find that a higher classification rate is obtained when using $texttGpFit$ in the hierarchical model.
arXiv Detail & Related papers (2020-02-27T19:01:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.