Small Aid, Big Leap: Efficient Test-Time Adaptation for Vision-Language Models with AdaptNet
- URL: http://arxiv.org/abs/2506.02671v1
- Date: Tue, 03 Jun 2025 09:16:51 GMT
- Title: Small Aid, Big Leap: Efficient Test-Time Adaptation for Vision-Language Models with AdaptNet
- Authors: Xiao Chen, Jiazhen Huang, Qinting Jiang, Fanding Huang, Xianghua Fu, Jingyan Jiang, Zhi Wang,
- Abstract summary: Test-time adaptation (TTA) has emerged as a critical technique for enhancing the generalization capability of vision-language models (VLMs) during inference.<n>We introduce SAIL, a novel adapter-based TTA framework that leverages a lightweight, learnable AdaptNet to enable efficient and scalable model adaptation.
- Score: 5.977269026037707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-time adaptation (TTA) has emerged as a critical technique for enhancing the generalization capability of vision-language models (VLMs) during inference. However, existing approaches often incur substantial computational costs and exhibit poor scalability, primarily due to sample-wise adaptation granularity and reliance on costly auxiliary designs such as data augmentation. To address these limitations, we introduce SAIL (Small Aid, Big Leap), a novel adapter-based TTA framework that leverages a lightweight, learnable AdaptNet to enable efficient and scalable model adaptation. As SAIL's core, a frozen pre-trained VLM collaborates with AdaptNet through a confidence-based interpolation weight, generating robust predictions during inference. These predictions serve as self-supervised targets to align AdaptNet's outputs through efficient batch-wise processing, dramatically reducing computational costs without modifying the VLM or requiring memory caches. To mitigate catastrophic forgetting during continual adaptation, we propose a gradient-aware reset strategy driven by a gradient drift indicator (GDI), which dynamically detects domain transitions and strategically resets AdaptNet for stable adaptation. Extensive experiments across diverse benchmarks on two scenarios demonstrate that SAIL achieves state-of-the-art performance while maintaining low computational costs. These results highlight SAIL's effectiveness, efficiency and scalability for real-world deployment. The code will be released upon acceptance.
Related papers
- Optimal Batch-Size Control for Low-Latency Federated Learning with Device Heterogeneity [24.47280082248569]
Federated learning (FL) has emerged as a popular approach for collaborative machine learning in sixth-generation (6G) networks.<n>The deployment of FL algorithms is expected to empower a wide range of Internet-of-Things (IoT) applications, e.g., autonomous driving, augmented reality, and healthcare.<n>We propose a novel C$2$-aware framework for optimal batch-size control that minimizes end-to-end (E2E) learning latency while ensuring convergence.
arXiv Detail & Related papers (2025-07-21T13:24:38Z) - Memory Efficient Transformer Adapter for Dense Predictions [42.413108132475855]
We propose META, a memory-efficient ViT adapter that can improve the model's memory efficiency and decrease memory time consumption.<n>Within the proposed block, the cross-shaped self-attention is employed to reduce the model's frequent reshaping operations.<n> META substantially enhances the predicted quality, while achieving a new state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2025-02-04T03:19:33Z) - Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves [123.07450481623124]
We propose Skip Tuning as a novel paradigm for adapting vision-language models to downstream tasks.<n>Unlike existing PT or adapter-based methods, Skip Tuning applies Layer-wise Skipping (LSkip) and Class-wise Skipping (CSkip) upon the FT baseline without introducing extra context vectors or adapter modules.
arXiv Detail & Related papers (2024-12-16T07:33:23Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - EUDA: An Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer [21.59850502993888]
Unsupervised domain adaptation (UDA) aims to mitigate the domain shift issue, where the distribution of training (source) data differs from that of testing (target) data.
Many models have been developed to tackle this problem, and recently vision transformers (ViTs) have shown promising results.
This paper introduces an efficient model that reduces trainable parameters and allows for adjustable complexity.
arXiv Detail & Related papers (2024-07-31T03:29:28Z) - Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Visual Prompt Tuning for Test-time Domain Adaptation [48.16620171809511]
We propose a simple recipe called data-efficient prompt tuning (DePT) with two key ingredients.
We find such parameter-efficient finetuning can efficiently adapt the model representation to the target domain without overfitting to the noise in the learning objective.
With much fewer parameters, DePT demonstrates not only state-of-the-art performance on major adaptation benchmarks, but also superior data efficiency.
arXiv Detail & Related papers (2022-10-10T16:45:13Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs)
In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR)
Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv Detail & Related papers (2020-04-08T10:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.