Scaling Experiments in Self-Supervised Cross-Table Representation
Learning
- URL: http://arxiv.org/abs/2309.17339v1
- Date: Fri, 29 Sep 2023 15:48:38 GMT
- Title: Scaling Experiments in Self-Supervised Cross-Table Representation
Learning
- Authors: Maximilian Schambach, Dominique Paul, Johannes S. Otterbach
- Abstract summary: We introduce a novel Transformer-based architecture specifically tailored to tabular data and cross-table representation learning.
Our training approach encompasses both single-table and cross-table models, trained via missing value imputation.
To understand the scaling behavior of our method, we train models of varying sizes, ranging from approximately $104$ to $107$ parameters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To analyze the scaling potential of deep tabular representation learning
models, we introduce a novel Transformer-based architecture specifically
tailored to tabular data and cross-table representation learning by utilizing
table-specific tokenizers and a shared Transformer backbone. Our training
approach encompasses both single-table and cross-table models, trained via
missing value imputation through a self-supervised masked cell recovery
objective. To understand the scaling behavior of our method, we train models of
varying sizes, ranging from approximately $10^4$ to $10^7$ parameters. These
models are trained on a carefully curated pretraining dataset, consisting of
135M training tokens sourced from 76 diverse datasets. We assess the scaling of
our architecture in both single-table and cross-table pretraining setups by
evaluating the pretrained models using linear probing on a curated set of
benchmark datasets and comparing the results with conventional baselines.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Distributionally robust self-supervised learning for tabular data [2.942619386779508]
Learning robust representation in presence of error slices is challenging, due to high cardinality features and the complexity of constructing error sets.
Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision.
Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations.
arXiv Detail & Related papers (2024-10-11T04:23:56Z) - Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications [9.457938949410583]
TabRepo is a new dataset of model evaluations and predictions.
It contains the predictions and metrics of 1310 models evaluated on 200 datasets.
arXiv Detail & Related papers (2023-11-06T09:17:18Z) - Training-Free Generalization on Heterogeneous Tabular Data via
Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM)
A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences.
Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z) - Retrieval-Based Transformer for Table Augmentation [14.460363647772745]
We introduce a novel approach toward automatic data wrangling.
We aim to address table augmentation tasks, including row/column population and data imputation.
Our model consistently and substantially outperforms both supervised statistical methods and the current state-of-the-art transformer-based models.
arXiv Detail & Related papers (2023-06-20T18:51:21Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.