Tabular Data: Is Attention All You Need?
- URL: http://arxiv.org/abs/2402.03970v1
- Date: Tue, 6 Feb 2024 12:59:02 GMT
- Title: Tabular Data: Is Attention All You Need?
- Authors: Guri Zab\"ergja, Arlind Kadra, Josif Grabocka
- Abstract summary: We introduce a large-scale empirical study comparing neural networks against gradient-boosted decision trees on structured data.
In contrast to prior work, our empirical findings indicate that neural networks are competitive against decision trees.
- Score: 23.787352248749382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning has revolutionized the field of AI and led to remarkable
achievements in applications involving image and text data. Unfortunately,
there is inconclusive evidence on the merits of neural networks for structured
tabular data. In this paper, we introduce a large-scale empirical study
comparing neural networks against gradient-boosted decision trees on tabular
data, but also transformer-based architectures against traditional multi-layer
perceptrons (MLP) with residual connections. In contrast to prior work, our
empirical findings indicate that neural networks are competitive against
decision trees. Furthermore, we assess that transformer-based architectures do
not outperform simpler variants of traditional MLP architectures on tabular
datasets. As a result, this paper helps the research and practitioner
communities make informed choices on deploying neural networks on future
tabular data applications.
Related papers
- Escaping the Forest: Sparse Interpretable Neural Networks for Tabular Data [0.0]
We show that our models, Sparse TABular NET or sTAB-Net with attention mechanisms, are more effective than tree-based models.
They achieve better performance than post-hoc methods like SHAP.
arXiv Detail & Related papers (2024-10-23T10:50:07Z) - A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure.
This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z) - Hidden Activations Are Not Enough: A General Approach to Neural Network Predictions [0.0]
We introduce a novel mathematical framework for analyzing neural networks using tools from quiver representation theory.
By leveraging the induced quiver representation of a data sample, we capture more information than traditional hidden layer outputs.
Results are architecture-agnostic and task-agnostic, making them broadly applicable.
arXiv Detail & Related papers (2024-09-20T02:35:13Z) - Defining Neural Network Architecture through Polytope Structures of Dataset [53.512432492636236]
This paper defines upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question.
We develop an algorithm to investigate a converse situation where the polytope structure of a dataset can be inferred from its corresponding trained neural networks.
It is established that popular datasets such as MNIST, Fashion-MNIST, and CIFAR10 can be efficiently encapsulated using no more than two polytopes with a small number of faces.
arXiv Detail & Related papers (2024-02-04T08:57:42Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - NCART: Neural Classification and Regression Tree for Tabular Data [0.5439020425819]
NCART is a modified version of Residual Networks that replaces fully-connected layers with multiple differentiable oblivious decision trees.
It maintains its interpretability while benefiting from the end-to-end capabilities of neural networks.
The simplicity of the NCART architecture makes it well-suited for datasets of varying sizes.
arXiv Detail & Related papers (2023-07-23T01:27:26Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z) - Dive into Layers: Neural Network Capacity Bounding using Algebraic
Geometry [55.57953219617467]
We show that the learnability of a neural network is directly related to its size.
We use Betti numbers to measure the topological geometric complexity of input data and the neural network.
We perform the experiments on a real-world dataset MNIST and the results verify our analysis and conclusion.
arXiv Detail & Related papers (2021-09-03T11:45:51Z) - TabularNet: A Neural Network Architecture for Understanding Semantic
Structures of Tabular Data [30.479822289380255]
We propose a novel neural network architecture, TabularNet, to simultaneously extract spatial and relational information from tables.
For relational information, we design a new graph construction method based on the WordNet tree and adopt a Graph Convolutional Network (GCN) based encoder.
Our neural network architecture can be a unified neural backbone for different understanding tasks and utilized in a multitask scenario.
arXiv Detail & Related papers (2021-06-06T11:48:09Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.