Related papers: TabTransformer: Tabular Data Modeling Using Contextual Embeddings

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

URL: http://arxiv.org/abs/2012.06678v1
Date: Fri, 11 Dec 2020 23:31:23 GMT
Title: TabTransformer: Tabular Data Modeling Using Contextual Embeddings
Authors: Xin Huang, Ashish Khetan, Milan Cvitkovic, Zohar Karnin
Abstract summary: We propose TabTransformer, a novel deep data modeling architecture for supervised and semi-supervised learning. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. For the semi-supervised setting, we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.
Score: 23.509063910635692
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.

Related papers

A Closer Look at TabPFN v2: Strength, Limitation, and Extension [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning accuracy across multiple datasets. In this paper, we evaluate TabPFN v2 on over 300 datasets, confirming its exceptional generalization capabilities on small- to medium-scale tasks.
arXiv Detail & Related papers (2025-02-24T17:38:42Z)
TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z)
Deep Learning with Tabular Data: A Self-supervised Approach [0.0]
We have used a self-supervised learning approach in this study. The aim is to find the most effective TabTransformer model representation of categorical and numerical features. The research has presented with a novel approach by creating various variants of TabTransformer model.
arXiv Detail & Related papers (2024-01-26T23:12:41Z)
Unifying Structured Data as Graph for Data-to-Text Pre-Training [69.96195162337793]
Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation. We propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer.
arXiv Detail & Related papers (2024-01-02T12:23:49Z)
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM) A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences. Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
Exploring the Benefits of Differentially Private Pre-training and Parameter-Efficient Fine-tuning for Table Transformers [56.00476706550681]
Table Transformer (TabTransformer) is a state-of-the-art neural network model, while Differential Privacy (DP) is an essential component to ensure data privacy. In this paper, we explore the benefits of combining these two aspects together in the scenario of transfer learning.
arXiv Detail & Related papers (2023-09-12T19:08:26Z)
Generative Table Pre-training Empowers Models for Tabular Prediction [71.76829961276032]
We propose TapTap, the first attempt that leverages table pre-training to empower models for tabular prediction. TapTap can generate high-quality synthetic tables to support various applications, including privacy protection, low resource regime, missing value imputation, and imbalanced classification. It can be easily combined with various backbone models, including LightGBM, Multilayer Perceptron (MLP) and Transformer.
arXiv Detail & Related papers (2023-05-16T06:37:38Z)
XTab: Cross-table Pretraining for Tabular Transformers [29.419276738753968]
XTab is a framework for cross-table pretraining of tabular transformers on datasets from various domains. We show that XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers. We achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.
arXiv Detail & Related papers (2023-05-10T12:17:52Z)
PTab: Using the Pre-trained Language Model for Modeling Tabular Data [5.791972449406902]
Recent studies show that neural-based models are effective in learning contextual representation for Tabular data. We propose a novel framework PTab, using the Pre-trained language model to model Tabular data. Our method has achieved a better average AUC score in supervised settings compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2022-09-15T08:58:42Z)
The GatedTabTransformer. An enhanced deep learning architecture for tabular modeling [0.0]
We propose multiple modifications to the original TabTransformer performing better on binary classification tasks. Inspired by gated, linear projections are implemented in the block and multiple activation functions are tested.
arXiv Detail & Related papers (2022-01-01T14:52:04Z)
Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints [63.84063384518667]
We propose a novel Transformer-based generation framework to achieve the goal. Core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss. To evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem.
arXiv Detail & Related papers (2020-05-03T02:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.