ProS: Prompting-to-simulate Generalized knowledge for Universal
Cross-Domain Retrieval
- URL: http://arxiv.org/abs/2312.12478v3
- Date: Thu, 29 Feb 2024 12:41:44 GMT
- Title: ProS: Prompting-to-simulate Generalized knowledge for Universal
Cross-Domain Retrieval
- Authors: Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng,
Xiyao Li, Heng Tao Shen
- Abstract summary: We propose textbfPrompting-to-textbfSimulate (ProS) to apply prompt tuning for Universal Cross-Domain Retrieval (UCDR)
ProS employs a two-step process to simulate Content-aware Dynamic Prompts (CaDP) which can impact models to produce generalized features for UCDR.
Our method achieves new state-of-the-art performance without bringing excessive parameters.
- Score: 123.51277978744677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust
performance in generalized test scenarios, wherein data may belong to strictly
unknown domains and categories during training. Recently, pre-trained models
with prompt tuning have shown strong generalization capabilities and attained
noteworthy achievements in various downstream tasks, such as few-shot learning
and video-text retrieval. However, applying them directly to UCDR may not
sufficiently to handle both domain shift (i.e., adapting to unfamiliar domains)
and semantic shift (i.e., transferring to unknown categories). To this end, we
propose \textbf{Pro}mpting-to-\textbf{S}imulate (ProS), the first method to
apply prompt tuning for UCDR. ProS employs a two-step process to simulate
Content-aware Dynamic Prompts (CaDP) which can impact models to produce
generalized features for UCDR. Concretely, in Prompt Units Learning stage, we
introduce two Prompt Units to individually capture domain and semantic
knowledge in a mask-and-align way. Then, in Context-aware Simulator Learning
stage, we train a Content-aware Prompt Simulator under a simulated test
scenarios to produce the corresponding CaDP. Extensive experiments conducted on
three benchmark datasets show that our method achieves new state-of-the-art
performance without bringing excessive parameters. Our method is publicly
available at https://github.com/fangkaipeng/ProS.
Related papers
- Soft Prompt Generation for Domain Generalization [13.957351735394683]
Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt.
To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt.
arXiv Detail & Related papers (2024-04-30T06:33:07Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - Semantic Residual Prompts for Continual Learning [21.986800282078498]
We show that our method significantly outperforms both state-of-the-art CL approaches and the zero-shot CLIP test.
Our findings hold true even for datasets with a substantial domain gap w.r.t. the pre-training knowledge of the backbone model.
arXiv Detail & Related papers (2024-03-11T16:23:38Z) - Prompt-based Context- and Domain-aware Pretraining for Vision and
Language Navigation [19.793659852435486]
We propose a novel Prompt-bAsed coNtext- and inDoor-Aware (PANDA) pretraining framework to address these problems.
In the indoor-aware stage, we apply an efficient tuning paradigm to learn deep visual prompts from an indoor dataset.
In the context-aware stage, we design a set of hard context prompts to capture the sequence-level semantics in the instruction.
arXiv Detail & Related papers (2023-09-07T11:58:34Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt.
Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.