Related papers: Domain-Controlled Prompt Learning

Domain-Controlled Prompt Learning

URL: http://arxiv.org/abs/2310.07730v2
Date: Tue, 12 Dec 2023 08:56:17 GMT
Title: Domain-Controlled Prompt Learning
Authors: Qinglong Cao, Zhengqin Xu, Yuntian Chen, Chao Ma, Xiaokang Yang
Abstract summary: Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms. We propose a textbfDomain-Controlled Prompt Learning for the specific domains. Our method achieves state-of-the-art performance in specific domain image recognition datasets.
Score: 49.45309818782329
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large pre-trained vision-language models, such as CLIP, have shown remarkable generalization capabilities across various tasks when appropriate text prompts are provided. However, adapting these models to specific domains, like remote sensing images (RSIs), medical images, etc, remains unexplored and challenging. Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms, leading to suboptimal performance due to the misinterpretation of specific images in natural image patterns. To tackle this dilemma, we proposed a \textbf{Domain-Controlled Prompt Learning} for the specific domains. Specifically, the large-scale specific domain foundation model (LSDM) is first introduced to provide essential specific domain knowledge. Using lightweight neural networks, we transfer this knowledge into domain biases, which control both the visual and language branches to obtain domain-adaptive prompts in a directly incorporating manner. Simultaneously, to overcome the existing overfitting challenge, we propose a novel noisy-adding strategy, without extra trainable parameters, to help the model escape the suboptimal solution in a global domain oscillation manner. Experimental results show our method achieves state-of-the-art performance in specific domain image recognition datasets. Our code is available at https://github.com/caoql98/DCPL.

Related papers

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization [75.88719716002014]
Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains.<n>Recent advances in pre-trained Visual Foundation Models (VFMs) have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models.<n>We propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM.
arXiv Detail & Related papers (2025-07-03T03:52:37Z)
UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement [25.139037597606233]
Zero-shot domain adaptation (ZSDA) presents substantial challenges due to the lack of images in the target domain.<n>Previous approaches leverage Vision-Language Models (VLMs) to tackle this challenge.<n>We propose the unified prompt and representation enhancement (UPRE) framework, which jointly optimize both textual prompts and visual representations.
arXiv Detail & Related papers (2025-07-01T13:00:41Z)
In the Era of Prompt Learning with Vision-Language Models [1.060608983034705]
We introduce textscStyLIP, a novel domain-agnostic prompt learning strategy for Domain Generalization (DG) StyLIP disentangles visual style and content in CLIPs vision encoder by using style projectors to learn domain-specific prompt tokens. We also propose AD-CLIP for unsupervised domain adaptation (DA), leveraging CLIPs frozen vision backbone.
arXiv Detail & Related papers (2024-11-07T17:31:21Z)
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation. We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart. We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z)
VLLaVO: Mitigating Visual Gap through LLMs [7.352822795984628]
Cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data. We propose VLLaVO, combining Vision language models and Large Language models as Visual cross-dOmain learners.
arXiv Detail & Related papers (2024-01-06T16:33:39Z)
Domain Prompt Learning with Quaternion Networks [49.45309818782329]
We propose to leverage domain-specific knowledge from domain-specific foundation models to transfer the robust recognition ability of Vision-Language Models to specialized domains. We present a hierarchical approach that generates vision prompt features by analyzing intermodal relationships between hierarchical language prompt features and domain-specific vision features. Our proposed method achieves new state-of-the-art results in prompt learning.
arXiv Detail & Related papers (2023-12-12T08:49:39Z)
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation. We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z)
Single Domain Dynamic Generalization for Iris Presentation Attack Detection [41.126916126040655]
Iris presentation generalization has achieved great success under intra-domain settings but easily degrades on unseen domains. We propose a Single Domain Dynamic Generalization (SDDG) framework, which exploits domain-invariant and domain-specific features on a per-sample basis. The proposed method is effective and outperforms the state-of-the-art on LivDet-Iris 2017 dataset.
arXiv Detail & Related papers (2023-05-22T07:54:13Z)
Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt. Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z)
Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains [73.54897096088149]
We propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains. The core idea is to augment the input image with style noise from different domains and then reconstruct the image from the embedding of the augmented image. Experiments on PACS and DomainNet illustrate that DiMAE achieves considerable gains compared with recent state-of-the-art methods.
arXiv Detail & Related papers (2022-05-10T09:49:40Z)
WEDGE: Web-Image Assisted Domain Generalization for Semantic Segmentation [72.88657378658549]
We propose a WEb-image assisted Domain GEneralization scheme, which is the first to exploit the diversity of web-crawled images for generalizable semantic segmentation. We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training.
arXiv Detail & Related papers (2021-09-29T05:19:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.