Related papers: Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer

Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer

URL: http://arxiv.org/abs/2502.04573v1
Date: Thu, 06 Feb 2025 23:58:11 GMT
Title: Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer
Authors: Yulun Wu, Doron L. Bergman,
Abstract summary: We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without pre-training on any real-world dataset.<n>APT is pre-trained with adversarial synthetic data agents, who deliberately challenge the model with different synthetic datasets.<n>We show that our framework matches state-of-the-art performance on small classification tasks without filtering on dataset characteristics.
Score: 2.1677183904102257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without pre-training on any real-world dataset, extending on the recent development of Prior-Data Fitted Networks (PFNs) and TabPFN. Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a mixture block architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitation -- a crucial weakness of prior deep tabular zero-shot learners. In experiments, we show that our framework matches state-of-the-art performance on small classification tasks without filtering on dataset characteristics such as number of classes and number of missing values, while maintaining an average runtime under one second. On common benchmark dataset suites in both classification and regression, we show that adversarial pre-training was able to enhance TabPFN's performance. In our analysis, we demonstrate that the adversarial synthetic data agents were able to generate a more diverse collection of data compared to the ordinary random generator in TabPFN. In addition, we demonstrate that our mixture block neural design has improved generalizability and greatly accelerated pre-training.

Related papers

Generalization Can Emerge in Tabular Foundation Models From a Single Table [38.07740881271672]
We show that simple self-supervised pre-training on just a emphsingle real table can produce surprisingly strong transfer across heterogeneous benchmarks.<n>We then connect this to the pre-training procedure shared by most TFMs and show that the number and quality of emphtasks one can construct from a dataset is key to downstream performance.
arXiv Detail & Related papers (2025-11-12T19:12:40Z)
Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models [85.64873567417396]
We introduce Mitra, a TFM trained on a curated mixture of synthetic priors selected for their diversity, distinctiveness, and performance on real-world data.<n>Mitra consistently outperforms state-of-the-art TFMs, such as TabPFNv2 and TabICL, across both classification and regression benchmarks.
arXiv Detail & Related papers (2025-10-24T07:15:06Z)
Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data [38.08600450054975]
We show that this performance can be significantly boosted by a targeted continued pre-training phase.<n>We demonstrate that leveraging a small, curated collection of large, real-world datasets for continued pre-training yields superior predictive downstream accuracy.<n>Our resulting model, Real-TabPFN, achieves substantial performance gains on 29 datasets from the OpenML AutoML Benchmark.
arXiv Detail & Related papers (2025-07-05T09:39:07Z)
Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable. We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data. Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z)
Federated Class-Incremental Learning with Hierarchical Generative Prototypes [10.532838477096055]
Federated Learning (FL) aims at unburdening the training of deep models by distributing computation across multiple devices (clients) Our proposal constrains both biases in the last layer by efficiently finetuning a pre-trained backbone using learnable prompts. Our method significantly improves the current State Of The Art, providing an average increase of +7.8% in accuracy.
arXiv Detail & Related papers (2024-06-04T16:12:27Z)
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM) A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences. Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
SEPT: Towards Scalable and Efficient Visual Pre-Training [11.345844145289524]
Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
arXiv Detail & Related papers (2022-12-11T11:02:11Z)
On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance. We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.