TAPS : Frustratingly Simple Test Time Active Learning for VLMs
- URL: http://arxiv.org/abs/2507.20028v1
- Date: Sat, 26 Jul 2025 18:04:49 GMT
- Title: TAPS : Frustratingly Simple Test Time Active Learning for VLMs
- Authors: Dhruv Sarkar, Aprameyo Chakrabartty, Bibhudatta Bhanja,
- Abstract summary: Test-Time Optimization enables models to adapt to new data during inference by updating parameters on-the-fly.<n>We propose a novel Test-Time Active Learning framework that adaptively queries uncertain samples and updates prompts dynamically.<n>Our framework provides a practical and effective solution for real-world deployment in safety-critical applications such as autonomous systems and medical diagnostics.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Test-Time Optimization enables models to adapt to new data during inference by updating parameters on-the-fly. Recent advances in Vision-Language Models (VLMs) have explored learning prompts at test time to improve performance in downstream tasks. In this work, we extend this idea by addressing a more general and practical challenge: Can we effectively utilize an oracle in a continuous data stream where only one sample is available at a time, requiring an immediate query decision while respecting latency and memory constraints? To tackle this, we propose a novel Test-Time Active Learning (TTAL) framework that adaptively queries uncertain samples and updates prompts dynamically. Unlike prior methods that assume batched data or multiple gradient updates, our approach operates in a real-time streaming scenario with a single test sample per step. We introduce a dynamically adjusted entropy threshold for active querying, a class-balanced replacement strategy for memory efficiency, and a class-aware distribution alignment technique to enhance adaptation. The design choices are justified using careful theoretical analysis. Extensive experiments across 10 cross-dataset transfer benchmarks and 4 domain generalization datasets demonstrate consistent improvements over state-of-the-art methods while maintaining reasonable latency and memory overhead. Our framework provides a practical and effective solution for real-world deployment in safety-critical applications such as autonomous systems and medical diagnostics.
Related papers
- Adapting Vision-Language Models Without Labels: A Comprehensive Survey [74.17944178027015]
Vision-Language Models (VLMs) have demonstrated remarkable generalization capabilities across a wide range of tasks.<n>Recent research has increasingly focused on unsupervised adaptation methods that do not rely on labeled data.<n>We propose a taxonomy based on the availability and nature of unlabeled visual data, categorizing existing approaches into four key paradigms.
arXiv Detail & Related papers (2025-08-07T16:27:37Z) - Space Rotation with Basis Transformation for Training-free Test-Time Adaptation [25.408849667998993]
We propose a training-free feature space rotation with basis transformation for test-time adaptation.<n>By leveraging the inherent distinctions among classes, we reconstruct the original feature space and map it to a new representation.<n>Our method outperforms state-of-the-art techniques in terms of both performance and efficiency.
arXiv Detail & Related papers (2025-02-27T10:15:34Z) - IT$^3$: Idempotent Test-Time Training [95.78053599609044]
Deep learning models often struggle when deployed in real-world settings due to distribution shifts between training and test data.<n>We present Idempotent Test-Time Training (IT$3$), a novel approach that enables on-the-fly adaptation to distribution shifts using only the current test instance.<n>Our results suggest that idempotence provides a universal principle for test-time adaptation that generalizes across domains and architectures.
arXiv Detail & Related papers (2024-10-05T15:39:51Z) - Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - Efficient Open Set Single Image Test Time Adaptation of Vision Language Models [15.621092104244003]
Adapting models to dynamic, real-world environments is a critical challenge in deep learning.<n>We propose ROSITA, a novel framework that leverages dynamically updated feature banks to identify reliable test samples.<n>Our approach effectively adapts models to domain shifts for known classes while rejecting unfamiliar samples.
arXiv Detail & Related papers (2024-06-01T16:21:42Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation [1.4530711901349282]
We propose to validate test-time adaptation methods using datasets for autonomous driving, namely CLAD-C and SHIFT.
We observe that current test-time adaptation methods struggle to effectively handle varying degrees of domain shift.
We enhance the well-established self-training framework by incorporating a small memory buffer to increase model stability.
arXiv Detail & Related papers (2023-09-18T19:34:23Z) - Streaming LifeLong Learning With Any-Time Inference [36.3326483579511]
We propose a novel lifelong learning approach, which is streaming, i.e., a single input sample arrives in each time step, single pass, class-incremental, and subject to be evaluated at any moment.
We additionally propose an implicit regularizer in the form of snap-shot self-distillation, which effectively minimizes the forgetting further.
Our empirical evaluations and ablations demonstrate that the proposed method outperforms the prior works by large margins.
arXiv Detail & Related papers (2023-01-27T18:09:19Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.