Related papers: Closing the gap on tabular data with Fourier and Implicit Categorical Features

Closing the gap on tabular data with Fourier and Implicit Categorical Features

URL: http://arxiv.org/abs/2602.23182v1
Date: Thu, 26 Feb 2026 16:40:23 GMT
Title: Closing the gap on tabular data with Fourier and Implicit Categorical Features
Authors: Marius Dragoi, Florin Gogianu, Elena Burceanu,
Abstract summary: We show that our proposed feature preprocessing significantly boosts the performance of deep learning models.<n>We show that our proposed feature preprocessing enables them to achieve a performance that closely matches or surpasses XGBoost.
Score: 3.071430103942477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last "unconquered castle" for neural networks. We hypothesize that a significant advantage of tree-based methods lies in their intrinsic capability to model and exploit non-linear interactions induced by features with categorical characteristics. In contrast, neural-based methods exhibit biases toward uniform numerical processing of features and smooth solutions, making it challenging for them to effectively leverage such patterns. We address this performance gap by using statistical-based feature processing techniques to identify features that are strongly correlated with the target once discretized. We further mitigate the bias of deep models for overly-smooth solutions, a bias that does not align with the inherent properties of the data, using Learned Fourier. We show that our proposed feature preprocessing significantly boosts the performance of deep learning models and enables them to achieve a performance that closely matches or surpasses XGBoost on a comprehensive tabular data benchmark.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Mambular: A Sequential Model for Tabular Deep Learning [0.7184556517162347]
This paper investigates the use of autoregressive state-space models for tabular data.<n>We compare their performance against established benchmark models.<n>Our findings indicate that interpreting features as a sequence and processing them can lead to significant performance improvement.
arXiv Detail & Related papers (2024-08-12T16:57:57Z)
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z)
Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures. We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data. Main aim of the identified model is to predict new data from previous observations. We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z)
Feature Weaken: Vicinal Data Augmentation for Classification [1.7013938542585925]
We use Feature Weaken to construct the vicinal data distribution with the same cosine similarity for model training. This work can not only improve the classification performance and generalization of the model, but also stabilize the model training and accelerate the model convergence.
arXiv Detail & Related papers (2022-11-20T11:00:23Z)
Feature Space Particle Inference for Neural Network Ensembles [13.392254060510666]
Particle-based inference methods offer a promising approach from a Bayesian perspective. We propose optimizing particles in the feature space where the activation of a specific intermediate layer lies. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness.
arXiv Detail & Related papers (2022-06-02T09:16:26Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.