The GatedTabTransformer. An enhanced deep learning architecture for
tabular modeling
- URL: http://arxiv.org/abs/2201.00199v1
- Date: Sat, 1 Jan 2022 14:52:04 GMT
- Title: The GatedTabTransformer. An enhanced deep learning architecture for
tabular modeling
- Authors: Radostin Cholakov and Todor Kolev
- Abstract summary: We propose multiple modifications to the original TabTransformer performing better on binary classification tasks.
Inspired by gated, linear projections are implemented in the block and multiple activation functions are tested.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: There is an increasing interest in the application of deep learning
architectures to tabular data. One of the state-of-the-art solutions is
TabTransformer which incorporates an attention mechanism to better track
relationships between categorical features and then makes use of a standard MLP
to output its final logits. In this paper we propose multiple modifications to
the original TabTransformer performing better on binary classification tasks
for three separate datasets with more than 1% AUROC gains. Inspired by gated
MLP, linear projections are implemented in the MLP block and multiple
activation functions are tested. We also evaluate the importance of specific
hyper parameters during training.
Related papers
- Mixture of Attention Yields Accurate Results for Tabular Data [21.410818837489973]
We propose MAYA, an encoder-decoder transformer-based framework.
In the encoder, we design a Mixture of Attention (MOA) that constructs multiple parallel attention branches.
We employ collaborative learning with a dynamic consistency weight constraint to produce more robust representations.
arXiv Detail & Related papers (2025-02-18T03:43:42Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling [28.37672139176765]
New model TabM relies on ensembling, where one TabM efficiently imitates an ensemble ofs and produces multiple predictions per object.
In TabM, the underlying implicits are trained simultaneously, and (by default) share most of their parameters, which results in significantly better performance and efficiency.
arXiv Detail & Related papers (2024-10-31T17:58:41Z) - Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification [13.481699494376809]
FT-TabPFN is an enhanced version of TabPFN that includes a novel Feature Tokenization layer to better handle classification features.
Our full source code is available for community use and development.
arXiv Detail & Related papers (2024-06-11T02:13:46Z) - Deep Learning with Tabular Data: A Self-supervised Approach [0.0]
We have used a self-supervised learning approach in this study.
The aim is to find the most effective TabTransformer model representation of categorical and numerical features.
The research has presented with a novel approach by creating various variants of TabTransformer model.
arXiv Detail & Related papers (2024-01-26T23:12:41Z) - Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP.
MP performs a linear classification head based on the mean of final features.
Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z) - Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers [71.32827362323205]
We propose a new class of linear Transformers calledLearner-Transformers (Learners)
They incorporate a wide range of relative positional encoding mechanisms (RPEs)
These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces.
arXiv Detail & Related papers (2023-02-03T18:57:17Z) - The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in
Transformers [59.87030906486969]
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.
We show that sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks.
We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers.
arXiv Detail & Related papers (2022-10-12T15:25:19Z) - Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations.
We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - TabTransformer: Tabular Data Modeling Using Contextual Embeddings [23.509063910635692]
We propose TabTransformer, a novel deep data modeling architecture for supervised and semi-supervised learning.
The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy.
For the semi-supervised setting, we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.
arXiv Detail & Related papers (2020-12-11T23:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.