COLA: Context-aware Language-driven Test-time Adaptation
- URL: http://arxiv.org/abs/2509.17598v1
- Date: Mon, 22 Sep 2025 11:19:17 GMT
- Title: COLA: Context-aware Language-driven Test-time Adaptation
- Authors: Aiming Zhang, Tianyuan Yu, Liang Bai, Jun Tang, Yanming Guo, Yirun Ruan, Yun Zhou, Zhihe Lu,
- Abstract summary: We investigate a more general source model capable of adaptation to multiple target domains without needing shared labels.<n>This is achieved by using a pre-trained vision-language model (VLM), egno, CLIP, that can recognize images through matching with class descriptions.<n>We propose a novel method -- Context-aware Language-driven TTA (COLA)
- Score: 20.919416740369975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-time adaptation (TTA) has gained increasing popularity due to its efficacy in addressing ``distribution shift'' issue while simultaneously protecting data privacy. However, most prior methods assume that a paired source domain model and target domain sharing the same label space coexist, heavily limiting their applicability. In this paper, we investigate a more general source model capable of adaptation to multiple target domains without needing shared labels. This is achieved by using a pre-trained vision-language model (VLM), \egno, CLIP, that can recognize images through matching with class descriptions. While the zero-shot performance of VLMs is impressive, they struggle to effectively capture the distinctive attributes of a target domain. To that end, we propose a novel method -- Context-aware Language-driven TTA (COLA). The proposed method incorporates a lightweight context-aware module that consists of three key components: a task-aware adapter, a context-aware unit, and a residual connection unit for exploring task-specific knowledge, domain-specific knowledge from the VLM and prior knowledge of the VLM, respectively. It is worth noting that the context-aware module can be seamlessly integrated into a frozen VLM, ensuring both minimal effort and parameter efficiency. Additionally, we introduce a Class-Balanced Pseudo-labeling (CBPL) strategy to mitigate the adverse effects caused by class imbalance. We demonstrate the effectiveness of our method not only in TTA scenarios but also in class generalisation tasks. The source code is available at https://github.com/NUDT-Bai-Group/COLA-TTA.
Related papers
- MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model [49.59690207400984]
Large audio-language models (LALMs) show strong zero-shot ability on speech tasks, suggesting promise for speech emotion recognition (SER)<n>We ask: given only unlabeled target-domain audio and an API-only LALM, can a student model be adapted to outperform the LALM in the target domain?<n>We propose MI-Fuse, a denoised label fusion framework that supplements the LALM with a source-domain trained SER as an auxiliary teacher.
arXiv Detail & Related papers (2025-09-25T03:16:32Z) - BANER: Boundary-Aware LLMs for Few-Shot Named Entity Recognition [12.57768435856206]
We propose an approach called Boundary-Aware LLMs for Few-Shot Named Entity Recognition.<n>We introduce a boundary-aware contrastive learning strategy to enhance the LLM's ability to perceive entity boundaries for generalized entity spans.<n>We utilize LoRAHub to align information from the target domain to the source domain, thereby enhancing adaptive cross-domain classification capabilities.
arXiv Detail & Related papers (2024-12-03T07:51:14Z) - Teaching VLMs to Localize Specific Objects from In-context Examples [56.797110842152]
We find that present-day Vision-Language Models (VLMs) lack a fundamental cognitive ability: learning to localize specific objects in a scene by taking into account the context.<n>This work is the first to explore and benchmark personalized few-shot localization for VLMs.
arXiv Detail & Related papers (2024-11-20T13:34:22Z) - ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting [24.56720920528011]
Vision-language models (VLMs) have excelled in multimodal tasks, but adapting them to embodied decision-making in open-world environments presents challenges.<n>One critical issue is bridging the gap between discrete entities in low-level observations and the abstract concepts required for effective planning.<n>We propose visual-temporal context, a novel communication protocol between VLMs and policy models.
arXiv Detail & Related papers (2024-10-23T13:26:59Z) - E2MPL:An Enduring and Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation [24.34819770490212]
Few-shot unsupervised domain adaptation (FS-UDA) leverages a limited amount of labeled data from a source domain to enable accurate classification in an unlabeled target domain.<n>We propose a novel framework called Enduring and Efficient Meta-Prompt Learning (E2MPL) for FS-UDA.<n>Within this framework, we utilize the pre-trained CLIP model as the backbone of feature learning.
arXiv Detail & Related papers (2024-07-04T17:13:06Z) - OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation [54.98688607911399]
We propose the task of open-vocabulary domain adaptation to infuse domain-specific knowledge into Vision-Language Models (VLMs)
Existing VLM adaptation methods improve performance on base (training) queries, but fail to preserve the open-set capabilities of VLMs on novel queries.
Our approach is the only parameter-efficient method that consistently surpasses the original VLM on novel classes.
arXiv Detail & Related papers (2024-05-30T15:16:06Z) - Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning [7.2523602603838535]
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to a target domain using only unlabeled target data.<n>We propose $textbfReliability-based Curriculum Learning (RCL)$, a novel framework that integrates multiple MLLMs for knowledge exploitation via pseudo-labeling in SFDA.<n>RCL achieves state-of-the-art (SOTA) performance on multiple SFDA benchmarks, e.g., $textbf+9.4%$ on DomainNet, demonstrating its effectiveness in enhancing adaptability and robustness without requiring access to source data.
arXiv Detail & Related papers (2024-05-28T17:18:17Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Prior Knowledge Guided Unsupervised Domain Adaptation [82.9977759320565]
We propose a Knowledge-guided Unsupervised Domain Adaptation (KUDA) setting where prior knowledge about the target class distribution is available.
In particular, we consider two specific types of prior knowledge about the class distribution in the target domain: Unary Bound and Binary Relationship.
We propose a rectification module that uses such prior knowledge to refine model generated pseudo labels.
arXiv Detail & Related papers (2022-07-18T18:41:36Z) - On Universal Black-Box Domain Adaptation [53.7611757926922]
We study an arguably least restrictive setting of domain adaptation in a sense of practical deployment.
Only the interface of source model is available to the target domain, and where the label-space relations between the two domains are allowed to be different and unknown.
We propose to unify them into a self-training framework, regularized by consistency of predictions in local neighborhoods of target samples.
arXiv Detail & Related papers (2021-04-10T02:21:09Z) - OVANet: One-vs-All Network for Universal Domain Adaptation [78.86047802107025]
Existing methods manually set a threshold to reject unknown samples based on validation or a pre-defined ratio of unknown samples.
We propose a method to learn the threshold using source samples and to adapt it to the target domain.
Our idea is that a minimum inter-class distance in the source domain should be a good threshold to decide between known or unknown in the target.
arXiv Detail & Related papers (2021-04-07T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.