Related papers: Test time training enhances in-context learning of nonlinear functions

Test time training enhances in-context learning of nonlinear functions

URL: http://arxiv.org/abs/2509.25741v1
Date: Tue, 30 Sep 2025 03:56:44 GMT
Title: Test time training enhances in-context learning of nonlinear functions
Authors: Kento Kuwataka, Taiji Suzuki,
Abstract summary: Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction.<n>We investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time.
Score: 51.56484100374058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction to adapt to the test data. While TTT has demonstrated considerable empirical success, its theoretical underpinnings remain limited, particularly for nonlinear models. In this paper, we investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time. We analyze this framework in the setting of single-index models $y=\sigma_*(\langle \beta, \mathbf{x} \rangle)$, where the feature vector $\beta$ is drawn from a hidden low-dimensional subspace. For single-layer transformers trained with gradient-based algorithms and adopting TTT, we establish an upper bound on the prediction risk. Our theory reveals that TTT enables the single-layer transformers to adapt to both the feature vector $\beta$ and the link function $\sigma_*$, which vary across tasks. This creates a sharp contrast with ICL alone, which is theoretically difficult to adapt to shifts in the link function. Moreover, we provide the convergence rate with respect to the data length, showing the predictive error can be driven arbitrarily close to the noise level as the context size and the network width grow.

Related papers

Understanding the Role of Training Data in Test-Time Scaling [56.12341509545198]
We study the performance of test-time scaling for transformers trained on an in-context weight prediction task for linear regression.<n>We show that training on a diverse, relevant, and hard set of tasks results in best performance for test-time scaling.
arXiv Detail & Related papers (2025-10-04T01:38:48Z)
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models [64.02612380298228]
Recent studies have explored the idea of continuing to train a model at test-time for a given task, known as test-time training (TTT)<n>We propose a model in which TTT achieves a substantially smaller in-distribution test error than global training.<n>We empirically validate our model's key assumptions by training a sparse autoencoder on ImageNet.
arXiv Detail & Related papers (2025-09-29T09:24:52Z)
BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis [41.09181390655176]
Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but degrade significantly under textittemporally evolving distribution shifts common in real-world scenarios.<n>We formalize this practical problem as textitContinual-Temporal Test-Time Adaptation (CT-TTA), where test distributions evolve gradually over time.<n>We propose textitBayesTTA, a Bayesian adaptation framework that enforces temporally consistent predictions and dynamically aligns visual representations.
arXiv Detail & Related papers (2025-07-11T14:02:54Z)
Test-Time Adaptation with Binary Feedback [50.20923012663613]
BiTTA is a novel dual-path optimization framework that balances binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions.<n> Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2025-05-24T05:24:10Z)
BoTTA: Benchmarking on-device Test Time Adaptation [0.7278033100480175]
Test-time adaptation (TTA) addresses this by adapting models during inference without requiring labeled test data or access to the original training set.<n>We propose BoTTA, a benchmark designed to evaluate TTA methods under practical constraints on mobile and edge devices.<n>We assess state-of-the-art TTA methods under these scenarios using benchmark datasets and report system-level metrics on a real testbed.
arXiv Detail & Related papers (2025-04-14T12:00:00Z)
Test-Time Training Provably Improves Transformers as In-context Learners [49.09821664572445]
We investigate a gradient-based TTT algorithm for in-context learning.<n>We train a transformer model on the in-context demonstrations provided in the test prompt.<n>As our empirical contribution, we study the benefits of TTT for TabPFN.
arXiv Detail & Related papers (2025-03-14T20:06:37Z)
Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models [4.655740975414312]
This paper introduces Test-Time Low-rank adaptation (TTL) as an alternative to prompt tuning for zero-shot generalizations of large-scale vision-language models (VLMs) TTL offers a test-time-efficient adaptation approach that updates the attention weights of the transformer by maximizing prediction confidence.
arXiv Detail & Related papers (2024-07-22T17:59:19Z)
Test-Time Training on Graphs with Large Language Models (LLMs) [68.375487369596]
Test-Time Training (TTT) has been proposed as a promising approach to train Graph Neural Networks (GNNs) Inspired by the great annotation ability of Large Language Models (LLMs) on Text-Attributed Graphs (TAGs), we propose to enhance the test-time training on graphs with LLMs as annotators. A two-stage training strategy is designed to tailor the test-time model with the limited and noisy labels.
arXiv Detail & Related papers (2024-04-21T08:20:02Z)
Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition [30.350005654271868]
Test-Time Adaptation aims to utilize the test stream to adjust predictions in real-time inference. High computational cost makes it intractable to run on resource-constrained edge devices. We propose an Optimization-Free Test-Time Adaptation framework for sensor-based HAR.
arXiv Detail & Related papers (2023-10-28T02:20:33Z)
Improved Test-Time Adaptation for Domain Generalization [48.239665441875374]
Test-time training (TTT) adapts the learned model with test data. This work addresses two main factors: selecting an appropriate auxiliary TTT task for updating and identifying reliable parameters to update during the test phase. We introduce additional adaptive parameters for the trained model, and we suggest only updating the adaptive parameters during the test phase.
arXiv Detail & Related papers (2023-04-10T10:12:38Z)
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample. TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.