Tabular Transformers for Modeling Multivariate Time Series
- URL: http://arxiv.org/abs/2011.01843v2
- Date: Thu, 11 Feb 2021 22:11:40 GMT
- Title: Tabular Transformers for Modeling Multivariate Time Series
- Authors: Inkit Padhi, Yair Schiff, Igor Melnyk, Mattia Rigotti, Youssef Mroueh,
Pierre Dognin, Jerret Ross, Ravi Nair, Erik Altman
- Abstract summary: Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential.
Here we propose neural network models that represent tabular time series that can leverage their hierarchical structure.
We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations.
- Score: 30.717890753132824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular datasets are ubiquitous in data science applications. Given their
importance, it seems natural to apply state-of-the-art deep learning algorithms
in order to fully unlock their potential. Here we propose neural network models
that represent tabular time series that can optionally leverage their
hierarchical structure. This results in two architectures for tabular time
series: one for learning representations that is analogous to BERT and can be
pre-trained end-to-end and used in downstream tasks, and one that is akin to
GPT and can be used for generation of realistic synthetic tabular sequences. We
demonstrate our models on two datasets: a synthetic credit card transaction
dataset, where the learned representations are used for fraud detection and
synthetic data generation, and on a real pollution dataset, where the learned
encodings are used to predict atmospheric pollutant concentrations. Code and
data are available at https://github.com/IBM/TabFormer.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure.
This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z) - Why In-Context Learning Transformers are Tabular Data Classifiers [22.33649426762373]
We show that ICL-transformers acquire the ability to create complex decision boundaries during pretraining.
We create TabForestPFN, the ICL-transformer pretrained on both the original TabPFN synthetic dataset generator and our forest dataset generator.
arXiv Detail & Related papers (2024-05-22T07:13:55Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Training-Free Generalization on Heterogeneous Tabular Data via
Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM)
A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences.
Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z) - AutoDiff: combining Auto-encoder and Diffusion model for tabular data
synthesizing [12.06889830487286]
Diffusion model has become a main paradigm for synthetic data generation in modern machine learning.
In this paper, we leverage the power of diffusion model for generating synthetic tabular data.
The resulting synthetic tables show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities.
arXiv Detail & Related papers (2023-10-24T03:15:19Z) - TabuLa: Harnessing Language Models for Tabular Data Synthesis [5.102332247789348]
We develop Tabula, a new type of data synthesizer based on the language model structure.
We show that Tabula averagely reduces 46.2% training time per epoch compared to current LLMs-based state-of-the-art algorithm.
We also propose a token sequence compression strategy to significantly reduce training time while preserving the quality of synthetic data.
arXiv Detail & Related papers (2023-10-19T13:50:56Z) - Generative Table Pre-training Empowers Models for Tabular Prediction [71.76829961276032]
We propose TapTap, the first attempt that leverages table pre-training to empower models for tabular prediction.
TapTap can generate high-quality synthetic tables to support various applications, including privacy protection, low resource regime, missing value imputation, and imbalanced classification.
It can be easily combined with various backbone models, including LightGBM, Multilayer Perceptron (MLP) and Transformer.
arXiv Detail & Related papers (2023-05-16T06:37:38Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.