Related papers: ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates

ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates

URL: http://arxiv.org/abs/2508.05898v1
Date: Thu, 07 Aug 2025 23:11:33 GMT
Title: ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates
Authors: Hamidreza Dastmalchi, Aijun An, Ali cheraghian,
Abstract summary: Test-Time Adaptation adapts vision-language models to unlabeled test data in new domains.<n>Current cache-based TTA models store only a limited set of high-confidence samples.<n>We propose a Recursive Updating module that integrates all incoming test samples.
Score: 5.84817561920117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretrained vision-language models (VLMs) like CLIP show strong zero-shot performance but struggle with generalization under distribution shifts. Test-Time Adaptation (TTA) addresses this by adapting VLMs to unlabeled test data in new domains. While some TTA methods rely on prompt-tuning, training-free cache-based approaches are preferred for efficiency. However, current cache-based TTA models store only a limited set of high-confidence samples, restricting the decision boundary to these samples and ignoring the influence of other incoming test data. To address this, we propose Efficient Test-Time Adaptation (ETTA), introducing a Recursive Updating module that integrates all incoming test samples, progressively refining the decision boundary. This strategy mimics an unbounded cache, dynamically updating contextual embeddings for improved accuracy with minimal memory and computational overhead. ETTA also includes an Adaptive Ensemble module to reduce prompt dependency in image-to-text scores by dynamically selecting optimal prompts for each class. Furthermore, ETTA adaptively combines scores from both modules based on confidence levels, leveraging their complementary strengths. Extensive experiments on two benchmarks confirm that ETTA surpasses the state-of-the-art TTA models in computational complexity and accuracy, setting a new standard for effective, efficient test-time adaptation. The code has been released at https://github.com/hamidreza-dastmalchi/ETTA.

Related papers

Unsupervised Layer-Wise Dynamic Test Time Adaptation for LLMs [12.428201810981149]
Test-time adaptation (TTA) for large language models (LLMs) updates model parameters at inference time using signals available at deployment.<n>This paper focuses on a common yet under-explored regime: unsupervised, sample-specific TTA.<n>We propose layer-wise dynamic test-time adaptation, a framework which explicitly modulates TTA strength as a function of prompt representation, LLM structure and adaptation step.
arXiv Detail & Related papers (2026-02-10T12:22:14Z)
Test-Time Adaptation with Binary Feedback [50.20923012663613]
BiTTA is a novel dual-path optimization framework that balances binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions.<n> Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2025-05-24T05:24:10Z)
SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting [39.00953148964911]
Vision-language models (VLMs) encounter challenges when adapting to domain shifts stemming from changes in data distribution.<n>Test-time adaptation (TTA) has emerged as a promising approach to enhance VLM performance under such conditions.<n>We propose Supportive Clique-based Attribute Prompting (SCAP) to enhance adaptation by generating fine-grained attribute prompts across test batches.
arXiv Detail & Related papers (2025-03-17T06:50:57Z)
Efficient Open Set Single Image Test Time Adaptation of Vision Language Models [15.621092104244003]
Adapting models to dynamic, real-world environments is a critical challenge in deep learning.<n>We propose ROSITA, a novel framework that leverages dynamically updated feature banks to identify reliable test samples.<n>Our approach effectively adapts models to domain shifts for known classes while rejecting unfamiliar samples.
arXiv Detail & Related papers (2024-06-01T16:21:42Z)
CLIPArTT: Adaptation of CLIP to New Domains at Test Time [19.0284321951354]
We introduce CLIP Adaptation duRing Test-Time (CLIPArTT), a fully test-time adaptation (TTA) approach for pre-trained vision-language models (VLMs)<n>Our method employs a unique, minimally invasive text prompt tuning process, wherein multiple predicted classes are aggregated into a single new text prompt, used as emphpseudo label to re-classify inputs.<n>Our findings demonstrate that, without requiring additional transformations nor new trainable modules, CLIPArTT enhances performance dynamically across non-corrupted datasets.
arXiv Detail & Related papers (2024-05-01T07:24:30Z)
Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings. We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z)
Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. We propose a test-time Forward-Optimization Adaptation (FOA) method. FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z)
Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation [14.112999441288615]
Test-time adaptation (TTA) adapts pre-trained models during inference using unlabeled test data. There is still a significant performance gap between the TTA approaches and their supervised counterparts. We propose ATASeg framework, which consists of two parts, i.e., model adapter and label annotator.
arXiv Detail & Related papers (2023-12-04T12:16:02Z)
AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation [1.4530711901349282]
We propose to validate test-time adaptation methods using datasets for autonomous driving, namely CLAD-C and SHIFT. We observe that current test-time adaptation methods struggle to effectively handle varying degrees of domain shift. We enhance the well-established self-training framework by incorporating a small memory buffer to increase model stability.
arXiv Detail & Related papers (2023-09-18T19:34:23Z)
Robust Continual Test-time Adaptation: Instance-aware BN and Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams. Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z)
CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z)
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples. Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.