On Embeddings for Numerical Features in Tabular Deep Learning
- URL: http://arxiv.org/abs/2203.05556v4
- Date: Thu, 26 Oct 2023 12:11:02 GMT
- Title: On Embeddings for Numerical Features in Tabular Deep Learning
- Authors: Yury Gorishniy and Ivan Rubachev and Artem Babenko
- Abstract summary: Transformer-like deep architectures have shown strong performance on data problems.
Unlike traditional models, these architectures map scalar values of numerical features to high-dimensional embeddings.
We show that embedding numerical features is beneficial for many backbones.
- Score: 35.26886042632547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Transformer-like deep architectures have shown strong performance
on tabular data problems. Unlike traditional models, e.g., MLP, these
architectures map scalar values of numerical features to high-dimensional
embeddings before mixing them in the main backbone. In this work, we argue that
embeddings for numerical features are an underexplored degree of freedom in
tabular DL, which allows constructing more powerful DL models and competing
with GBDT on some traditionally GBDT-friendly benchmarks. We start by
describing two conceptually different approaches to building embedding modules:
the first one is based on a piecewise linear encoding of scalar values, and the
second one utilizes periodic activations. Then, we empirically demonstrate that
these two approaches can lead to significant performance boosts compared to the
embeddings based on conventional blocks such as linear layers and ReLU
activations. Importantly, we also show that embedding numerical features is
beneficial for many backbones, not only for Transformers. Specifically, after
proper embeddings, simple MLP-like models can perform on par with the
attention-based architectures. Overall, we highlight embeddings for numerical
features as an important design aspect with good potential for further
improvements in tabular DL.
Related papers
- Transformers with Stochastic Competition for Tabular Data Modelling [6.285325771390289]
We introduce a novel deep learning model specifically designed for tabular data.
The model is validated on a variety of widely-used, publicly available datasets.
We demonstrate that, through the incorporation of these elements, our model yields high performance.
arXiv Detail & Related papers (2024-07-18T07:48:48Z) - Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later [59.88557193062348]
We revisit the classic Neighborhood Component Analysis (NCA), designed to learn a linear projection that captures semantic similarities between instances.
We find that minor modifications, such as adjustments to the learning objectives and the integration of deep learning architectures, significantly enhance NCA's performance.
We also introduce a neighbor sampling strategy that improves both the efficiency and predictive accuracy of our proposed ModernNCA.
arXiv Detail & Related papers (2024-07-03T16:38:57Z) - Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.
We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters.
We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z) - Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Tree-Regularized Tabular Embeddings [22.095328171882223]
Tabular neural network (NN) has attracted remarkable attentions and its recent advances have gradually narrowed the performance gap with respect to tree-based models on many public datasets.
We emphasize the importance of homogeneous embeddings and alternately concentrate on regularizing inputs through supervised pretraining.
Specifically, we utilize the structure of pretrained tree ensembles to transform raw variables into a single vector (T2V), or an array of tokens (T2T)
arXiv Detail & Related papers (2024-03-01T20:26:33Z) - TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023 [33.70333110327871]
We present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle.
On a set of public benchmarks with datasets up to several million objects, TabR demonstrates the best average performance.
In addition to the much higher performance, TabR is simple and significantly more efficient.
arXiv Detail & Related papers (2023-07-26T17:58:07Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - The GatedTabTransformer. An enhanced deep learning architecture for
tabular modeling [0.0]
We propose multiple modifications to the original TabTransformer performing better on binary classification tasks.
Inspired by gated, linear projections are implemented in the block and multiple activation functions are tested.
arXiv Detail & Related papers (2022-01-01T14:52:04Z) - Revisiting Deep Learning Models for Tabular Data [40.67427600770095]
It is unclear for both researchers and practitioners what models perform best.
The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works.
The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks.
arXiv Detail & Related papers (2021-06-22T17:58:10Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.