Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
- URL: http://arxiv.org/abs/2411.10634v1
- Date: Fri, 15 Nov 2024 23:49:23 GMT
- Title: Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
- Authors: Kai Helli, David Schnurr, Noah Hollmann, Samuel Müller, Frank Hutter,
- Abstract summary: We present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network.
It learns to approximate Bayesian inference on synthetic datasets drawn from a prior.
It improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration.
- Score: 39.40116554523575
- License:
- Abstract: While most ML models expect independent and identically distributed data, this assumption is often violated in real-world scenarios due to distribution shifts, resulting in the degradation of machine learning model performance. Until now, no tabular method has consistently outperformed classical supervised learning, which ignores these shifts. To address temporal distribution shifts, we present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network that learns the learning algorithm itself: it accepts the entire training dataset as input and makes predictions on the test set in a single forward pass. Specifically, it learns to approximate Bayesian inference on synthetic datasets drawn from a prior that specifies the model's inductive bias. This prior is based on structural causal models (SCM), which gradually shift over time. To model shifts of these causal models, we use a secondary SCM, that specifies changes in the primary model parameters. The resulting Drift-Resilient TabPFN can be applied to unseen data, runs in seconds on small to moderately sized datasets and needs no hyperparameter tuning. Comprehensive evaluations across 18 synthetic and real-world datasets demonstrate large performance improvements over a wide range of baselines, such as XGB, CatBoost, TabPFN, and applicable methods featured in the Wild-Time benchmark. Compared to the strongest baselines, it improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration. This approach could serve as significant groundwork for further research on out-of-distribution prediction.
Related papers
- Distributionally robust self-supervised learning for tabular data [2.942619386779508]
Learning robust representation in presence of error slices is challenging, due to high cardinality features and the complexity of constructing error sets.
Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision.
Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations.
arXiv Detail & Related papers (2024-10-11T04:23:56Z) - Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line [65.14099135546594]
Recent test-time adaptation (TTA) methods drastically strengthen the ACL and AGL trends in models, even in shifts where models showed very weak correlations before.
Our results show that by combining TTA with AGL-based estimation methods, we can estimate the OOD performance of models with high precision for a broader set of distribution shifts.
arXiv Detail & Related papers (2023-10-07T23:21:25Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - DeepVARwT: Deep Learning for a VAR Model with Trend [1.9862987223379664]
We propose a new approach that employs deep learning methodology for maximum likelihood estimation of the trend and the dependence structure.
A Long Short-Term Memory (LSTM) network is used for this purpose.
We provide a simulation study and an application to real data.
arXiv Detail & Related papers (2022-09-21T18:23:03Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z) - Evaluating Predictive Uncertainty and Robustness to Distributional Shift
Using Real World Data [0.0]
We propose metrics for general regression tasks using the Shifts Weather Prediction dataset.
We also present an evaluation of the baseline methods using these metrics.
arXiv Detail & Related papers (2021-11-08T17:32:10Z) - Learning Neural Models for Natural Language Processing in the Face of
Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications.
It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time.
This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information.
It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Post-Estimation Smoothing: A Simple Baseline for Learning with Side
Information [102.18616819054368]
We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction.
Because the smoothing step is separate from the original predictor, it applies to a broad class of machine learning tasks.
Our experiments on large scale spatial and temporal datasets highlight the speed and accuracy of post-estimation smoothing in practice.
arXiv Detail & Related papers (2020-03-12T18:04:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.