Related papers: TabNSA: Native Sparse Attention for Efficient Tabular Data Learning

TabNSA: Native Sparse Attention for Efficient Tabular Data Learning

URL: http://arxiv.org/abs/2503.09850v1
Date: Wed, 12 Mar 2025 21:13:41 GMT
Title: TabNSA: Native Sparse Attention for Efficient Tabular Data Learning
Authors: Ali Eslamian, Qiang Cheng,
Abstract summary: This paper introduces TabNSA, a novel deep learning architecture leveraging Native Sparse Attention (NSA)<n> TabNSA incorporates a dynamic hierarchical sparse strategy, combining coarse-grained feature compression with fine-grained feature selection to preserve both global context awareness and local precision.
Score: 13.110156202816112
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tabular data poses unique challenges for deep learning due to its heterogeneous features and lack of inherent spatial structure. This paper introduces TabNSA, a novel deep learning architecture leveraging Native Sparse Attention (NSA) specifically for efficient tabular data processing. TabNSA incorporates a dynamic hierarchical sparse strategy, combining coarse-grained feature compression with fine-grained feature selection to preserve both global context awareness and local precision. By dynamically focusing on relevant subsets of features, TabNSA effectively captures intricate feature interactions. Extensive experiments demonstrate that TabNSA consistently outperforms existing methods, including both deep learning architectures and ensemble decision trees, achieving state-of-the-art performance across various benchmark datasets.

Related papers

TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data [5.98480077860174]
We introduce TabDeco, a novel method that leverages attention-based encoding strategies across both rows and columns. With the innovative feature decoupling hierarchies, TabDeco consistently surpasses existing deep learning methods.
arXiv Detail & Related papers (2024-11-17T18:42:46Z)
TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering [5.946579489162407]
This work introduces TabSeq, a novel framework for the sequential ordering of features. Finding the optimum sequence order for such features could improve the deep learning models' learning process.
arXiv Detail & Related papers (2024-10-17T04:10:36Z)
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances. Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data. We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
A Closer Look at Deep Learning Methods on Tabular Datasets [52.50778536274327]
Tabular data is prevalent across diverse domains in machine learning.<n>Deep Neural Network (DNN)-based methods have recently demonstrated promising performance.<n>We compare 32 state-of-the-art deep and tree-based methods, evaluating their average performance across multiple criteria.
arXiv Detail & Related papers (2024-07-01T04:24:07Z)
Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains [0.565395466029518]
We propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. Our empirical investigations ascertain several advantages of binning.
arXiv Detail & Related papers (2024-05-13T01:23:14Z)
Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. We present TP-BERTa, a specifically pre-trained LM for tabular data prediction. A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z)
Rethinking Pre-Training in Tabular Data: A Neighborhood Embedding Perspective [71.45945607871715]
We propose Tabular data Pre-Training via Meta-representation (TabPTM)<n>The core idea is to embed data instances into a shared feature space, where each instance is represented by its distance to a fixed number of nearest neighbors and their labels.<n>Extensive experiments on 101 datasets confirm TabPTM's effectiveness in both classification and regression tasks, with and without fine-tuning.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
Local Contrastive Feature learning for Tabular Data [8.93957397187611]
We propose a new local contrastive feature learning framework (LoCL) In order to create a niche for local learning, we use feature correlations to create a maximum-spanning tree, and break the tree into feature subsets. Convolutional learning of the features is used to learn latent feature space, regulated by contrastive and reconstruction losses.
arXiv Detail & Related papers (2022-11-19T00:53:41Z)
SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab) In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab) We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z)
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.