Related papers: On Embeddings for Numerical Features in Tabular Deep Learning

On Embeddings for Numerical Features in Tabular Deep Learning

URL: http://arxiv.org/abs/2203.05556v4
Date: Thu, 26 Oct 2023 12:11:02 GMT
Title: On Embeddings for Numerical Features in Tabular Deep Learning
Authors: Yury Gorishniy and Ivan Rubachev and Artem Babenko
Abstract summary: Transformer-like deep architectures have shown strong performance on data problems. Unlike traditional models, these architectures map scalar values of numerical features to high-dimensional embeddings. We show that embedding numerical features is beneficial for many backbones.
Score: 35.26886042632547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with GBDT on some traditionally GBDT-friendly benchmarks. We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.

Related papers

A Closer Look at TabPFN v2: Strength, Limitation, and Extension [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning accuracy across multiple datasets. In this paper, we evaluate TabPFN v2 on over 300 datasets, confirming its exceptional generalization capabilities on small- to medium-scale tasks.
arXiv Detail & Related papers (2025-02-24T17:38:42Z)
Byte Latent Transformer: Patches Scale Better Than Tokens [101.10994909832063]
Byte Latent Transformer (BLT) encodes bytes into dynamically sized patches, which serve as the primary units of computation. For fixed inference costs, BLT shows significantly better scaling than tokenization-based models, by simultaneously growing both patch and model size.
arXiv Detail & Related papers (2024-12-13T05:33:32Z)
On the Efficiency of NLP-Inspired Methods for Tabular Deep Learning [0.0]
This paper critically examines the latest innovations in tabular deep learning (DL) It focuses on performance and computational efficiency. The source code is available at https://github.com/basf/mamba-tabular.
arXiv Detail & Related papers (2024-11-26T08:23:29Z)
Mambular: A Sequential Model for Tabular Deep Learning [0.7184556517162347]
This paper investigates the use of autoregressive state-space models for tabular data. We compare their performance against established benchmark models. Our findings indicate that interpreting features as a sequence and processing them can lead to significant performance improvement.
arXiv Detail & Related papers (2024-08-12T16:57:57Z)
Transformers with Stochastic Competition for Tabular Data Modelling [6.285325771390289]
We introduce a novel deep learning model specifically designed for tabular data. The model is validated on a variety of widely-used, publicly available datasets. We demonstrate that, through the incorporation of these elements, our model yields high performance.
arXiv Detail & Related papers (2024-07-18T07:48:48Z)
Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later [59.88557193062348]
We revisit the classic Neighborhood Component Analysis (NCA), designed to learn a linear projection that captures semantic similarities between instances. We find that minor modifications, such as adjustments to the learning objectives and the integration of deep learning architectures, significantly enhance NCA's performance. We also introduce a neighbor sampling strategy that improves both the efficiency and predictive accuracy of our proposed ModernNCA.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z)
Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. We present TP-BERTa, a specifically pre-trained LM for tabular data prediction. A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z)
Tree-Regularized Tabular Embeddings [22.095328171882223]
Tabular neural network (NN) has attracted remarkable attentions and its recent advances have gradually narrowed the performance gap with respect to tree-based models on many public datasets. We emphasize the importance of homogeneous embeddings and alternately concentrate on regularizing inputs through supervised pretraining. Specifically, we utilize the structure of pretrained tree ensembles to transform raw variables into a single vector (T2V), or an array of tokens (T2T)
arXiv Detail & Related papers (2024-03-01T20:26:33Z)
TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023 [33.70333110327871]
We present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle. On a set of public benchmarks with datasets up to several million objects, TabR demonstrates the best average performance. In addition to the much higher performance, TabR is simple and significantly more efficient.
arXiv Detail & Related papers (2023-07-26T17:58:07Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
The GatedTabTransformer. An enhanced deep learning architecture for tabular modeling [0.0]
We propose multiple modifications to the original TabTransformer performing better on binary classification tasks. Inspired by gated, linear projections are implemented in the block and multiple activation functions are tested.
arXiv Detail & Related papers (2022-01-01T14:52:04Z)
Dual-constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net. To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network. Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.