Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
- URL: http://arxiv.org/abs/2506.17307v1
- Date: Wed, 18 Jun 2025 03:49:22 GMT
- Title: Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
- Authors: Zhixiang Chi, Li Gu, Huan Liu, Ziqiang Wang, Yanan Wu, Yang Wang, Konstantinos N Plataniotis,
- Abstract summary: Test-Time Domain Adaptation focuses on adapting a model at test time to a specific domain using only a few unlabeled examples.<n>This work introduces learning directly on the input space to complement the dataset-specific knowledge for frozen CLIP.
- Score: 37.93085430960873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot Test-Time Domain Adaptation focuses on adapting a model at test time to a specific domain using only a few unlabeled examples, addressing domain shift. Prior methods leverage CLIP's strong out-of-distribution (OOD) abilities by generating domain-specific prompts to guide its generalized, frozen features. However, since downstream datasets are not explicitly seen by CLIP, solely depending on the feature space knowledge is constrained by CLIP's prior knowledge. Notably, when using a less robust backbone like ViT-B/16, performance significantly drops on challenging real-world benchmarks. Departing from the state-of-the-art of inheriting the intrinsic OOD capability of CLIP, this work introduces learning directly on the input space to complement the dataset-specific knowledge for frozen CLIP. Specifically, an independent side branch is attached in parallel with CLIP and enforced to learn exclusive knowledge via revert attention. To better capture the dataset-specific label semantics for downstream adaptation, we propose to enhance the inter-dispersion among text features via greedy text ensemble and refinement. The text and visual features are then progressively fused in a domain-aware manner by a generated domain prompt to adapt toward a specific domain. Extensive experiments show our method's superiority on 5 large-scale benchmarks (WILDS and DomainNet), notably improving over smaller networks like ViT-B/16 with gains of \textbf{+5.1} in F1 for iWildCam and \textbf{+3.1\%} in WC Acc for FMoW.
Related papers
- UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models [75.77651291095565]
We leverage unlabeled data that naturally spans multiple domains to enhance the transferability of vision-language models.
Under this unsupervised multi-domain setting, we have identified inherent model bias within CLIP.
We propose Unsupervised Multi-domain Feature (UMFC) to mitigate this model bias.
arXiv Detail & Related papers (2024-11-11T12:25:02Z) - Rethinking Domain Adaptation and Generalization in the Era of CLIP [27.12334798260904]
We show that a simple domain prior boosts CLIP's zero-shot recognition in a specific domain.
We also create a benchmark for zero-shot adaptation and pseudo-labeling based self-training with CLIP.
We propose to improve the task generalization ability of CLIP from multiple unlabeled domains.
arXiv Detail & Related papers (2024-07-21T14:09:14Z) - CLIPArTT: Adaptation of CLIP to New Domains at Test Time [19.0284321951354]
We introduce CLIP Adaptation duRing Test-Time (CLIPArTT), a fully test-time adaptation (TTA) approach for pre-trained vision-language models (VLMs)<n>Our method employs a unique, minimally invasive text prompt tuning process, wherein multiple predicted classes are aggregated into a single new text prompt, used as emphpseudo label to re-classify inputs.<n>Our findings demonstrate that, without requiring additional transformations nor new trainable modules, CLIPArTT enhances performance dynamically across non-corrupted datasets.
arXiv Detail & Related papers (2024-05-01T07:24:30Z) - Align Your Prompts: Test-Time Prompting with Distribution Alignment for
Zero-Shot Generalization [64.62570402941387]
We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain.
Our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe.
arXiv Detail & Related papers (2023-11-02T17:59:32Z) - AD-CLIP: Adapting Domains in Prompt Space Using CLIP [11.836764044083257]
We introduce textscAD-CLIP, a domain-agnostic prompt learning strategy for CLIP.
Our prompts are designed to be domain-invariant and class-generalizable, by conditioning prompt learning on image style and content features simultaneously.
Our experiments on three benchmark DA datasets demonstrate the effectiveness of textscAD-CLIP compared to existing literature.
arXiv Detail & Related papers (2023-08-10T15:58:28Z) - P{\O}DA: Prompt-driven Zero-shot Domain Adaptation [27.524962843495366]
We adapt a model trained on a source domain using only a general description in natural language of the target domain, i.e., a prompt.
We show that these prompt-driven augmentations can be used to perform zero-shot domain adaptation for semantic segmentation.
arXiv Detail & Related papers (2022-12-06T18:59:58Z) - Cross-domain Contrastive Learning for Unsupervised Domain Adaptation [108.63914324182984]
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.
We build upon contrastive self-supervised learning to align features so as to reduce the domain discrepancy between training and testing sets.
arXiv Detail & Related papers (2021-06-10T06:32:30Z) - Prototypical Cross-domain Self-supervised Learning for Few-shot
Unsupervised Domain Adaptation [91.58443042554903]
We propose an end-to-end Prototypical Cross-domain Self-Supervised Learning (PCS) framework for Few-shot Unsupervised Domain Adaptation (FUDA)
PCS not only performs cross-domain low-level feature alignment, but it also encodes and aligns semantic structures in the shared embedding space across domains.
Compared with state-of-the-art methods, PCS improves the mean classification accuracy over different domain pairs on FUDA by 10.5%, 3.5%, 9.0%, and 13.2% on Office, Office-Home, VisDA-2017, and DomainNet, respectively.
arXiv Detail & Related papers (2021-03-31T02:07:42Z) - Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels.
We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.