Domain-Controlled Prompt Learning
- URL: http://arxiv.org/abs/2310.07730v2
- Date: Tue, 12 Dec 2023 08:56:17 GMT
- Title: Domain-Controlled Prompt Learning
- Authors: Qinglong Cao, Zhengqin Xu, Yuntian Chen, Chao Ma, Xiaokang Yang
- Abstract summary: Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms.
We propose a textbfDomain-Controlled Prompt Learning for the specific domains.
Our method achieves state-of-the-art performance in specific domain image recognition datasets.
- Score: 49.45309818782329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained vision-language models, such as CLIP, have shown remarkable
generalization capabilities across various tasks when appropriate text prompts
are provided. However, adapting these models to specific domains, like remote
sensing images (RSIs), medical images, etc, remains unexplored and challenging.
Existing prompt learning methods often lack domain-awareness or domain-transfer
mechanisms, leading to suboptimal performance due to the misinterpretation of
specific images in natural image patterns. To tackle this dilemma, we proposed
a \textbf{Domain-Controlled Prompt Learning} for the specific domains.
Specifically, the large-scale specific domain foundation model (LSDM) is first
introduced to provide essential specific domain knowledge. Using lightweight
neural networks, we transfer this knowledge into domain biases, which control
both the visual and language branches to obtain domain-adaptive prompts in a
directly incorporating manner. Simultaneously, to overcome the existing
overfitting challenge, we propose a novel noisy-adding strategy, without extra
trainable parameters, to help the model escape the suboptimal solution in a
global domain oscillation manner. Experimental results show our method achieves
state-of-the-art performance in specific domain image recognition datasets. Our
code is available at https://github.com/caoql98/DCPL.
Related papers
- In the Era of Prompt Learning with Vision-Language Models [1.060608983034705]
We introduce textscStyLIP, a novel domain-agnostic prompt learning strategy for Domain Generalization (DG)
StyLIP disentangles visual style and content in CLIPs vision encoder by using style projectors to learn domain-specific prompt tokens.
We also propose AD-CLIP for unsupervised domain adaptation (DA), leveraging CLIPs frozen vision backbone.
arXiv Detail & Related papers (2024-11-07T17:31:21Z) - WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation.
We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart.
We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z) - VLLaVO: Mitigating Visual Gap through LLMs [7.352822795984628]
Cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data.
We propose VLLaVO, combining Vision language models and Large Language models as Visual cross-dOmain learners.
arXiv Detail & Related papers (2024-01-06T16:33:39Z) - Domain Prompt Learning with Quaternion Networks [49.45309818782329]
We propose to leverage domain-specific knowledge from domain-specific foundation models to transfer the robust recognition ability of Vision-Language Models to specialized domains.
We present a hierarchical approach that generates vision prompt features by analyzing intermodal relationships between hierarchical language prompt features and domain-specific vision features.
Our proposed method achieves new state-of-the-art results in prompt learning.
arXiv Detail & Related papers (2023-12-12T08:49:39Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - Single Domain Dynamic Generalization for Iris Presentation Attack
Detection [41.126916126040655]
Iris presentation generalization has achieved great success under intra-domain settings but easily degrades on unseen domains.
We propose a Single Domain Dynamic Generalization (SDDG) framework, which exploits domain-invariant and domain-specific features on a per-sample basis.
The proposed method is effective and outperforms the state-of-the-art on LivDet-Iris 2017 dataset.
arXiv Detail & Related papers (2023-05-22T07:54:13Z) - Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt.
Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z) - Domain Invariant Masked Autoencoders for Self-supervised Learning from
Multi-domains [73.54897096088149]
We propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains.
The core idea is to augment the input image with style noise from different domains and then reconstruct the image from the embedding of the augmented image.
Experiments on PACS and DomainNet illustrate that DiMAE achieves considerable gains compared with recent state-of-the-art methods.
arXiv Detail & Related papers (2022-05-10T09:49:40Z) - WEDGE: Web-Image Assisted Domain Generalization for Semantic
Segmentation [72.88657378658549]
We propose a WEb-image assisted Domain GEneralization scheme, which is the first to exploit the diversity of web-crawled images for generalizable semantic segmentation.
We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training.
arXiv Detail & Related papers (2021-09-29T05:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.