Test-Time Training Provably Improves Transformers as In-context Learners
- URL: http://arxiv.org/abs/2503.11842v1
- Date: Fri, 14 Mar 2025 20:06:37 GMT
- Title: Test-Time Training Provably Improves Transformers as In-context Learners
- Authors: Halil Alperen Gozeten, M. Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak,
- Abstract summary: We investigate a gradient-based TTT algorithm for in-context learning.<n>We train a transformer model on the in-context demonstrations provided in the test prompt.<n>As our empirical contribution, we study the benefits of TTT for TabPFN.
- Score: 49.09821664572445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.
Related papers
- BoTTA: Benchmarking on-device Test Time Adaptation [0.7278033100480175]
Test-time adaptation (TTA) addresses this by adapting models during inference without requiring labeled test data or access to the original training set.
We propose BoTTA, a benchmark designed to evaluate TTA methods under practical constraints on mobile and edge devices.
We assess state-of-the-art TTA methods under these scenarios using benchmark datasets and report system-level metrics on a real testbed.
arXiv Detail & Related papers (2025-04-14T12:00:00Z) - BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Test-Time Training on Graphs with Large Language Models (LLMs) [68.375487369596]
Test-Time Training (TTT) has been proposed as a promising approach to train Graph Neural Networks (GNNs)
Inspired by the great annotation ability of Large Language Models (LLMs) on Text-Attributed Graphs (TAGs), we propose to enhance the test-time training on graphs with LLMs as annotators.
A two-stage training strategy is designed to tailor the test-time model with the limited and noisy labels.
arXiv Detail & Related papers (2024-04-21T08:20:02Z) - Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - Test-Time Adaptation with Perturbation Consistency Learning [32.58879780726279]
We propose a simple test-time adaptation method to promote the model to make stable predictions for samples with distribution shifts.
Our method can achieve higher or comparable performance with less inference time over strong PLM backbones.
arXiv Detail & Related papers (2023-04-25T12:29:22Z) - Improved Test-Time Adaptation for Domain Generalization [48.239665441875374]
Test-time training (TTT) adapts the learned model with test data.
This work addresses two main factors: selecting an appropriate auxiliary TTT task for updating and identifying reliable parameters to update during the test phase.
We introduce additional adaptive parameters for the trained model, and we suggest only updating the adaptive parameters during the test phase.
arXiv Detail & Related papers (2023-04-10T10:12:38Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [117.72709110877939]
Test-time adaptation (TTA) has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
We categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Revisiting Realistic Test-Time Training: Sequential Inference and
Adaptation by Anchored Clustering Regularized Self-Training [37.75537703971045]
We develop a test-time anchored clustering (TTAC) approach to enable stronger test-time feature learning.
Self-training(ST) has demonstrated great success in learning from unlabeled data.
TTAC++ consistently outperforms the state-of-the-art methods on five TTT datasets.
arXiv Detail & Related papers (2023-03-20T04:30:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.