CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning
- URL: http://arxiv.org/abs/2407.21043v2
- Date: Fri, 2 Aug 2024 14:58:54 GMT
- Title: CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning
- Authors: Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song,
- Abstract summary: Key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data.
We propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains.
- Score: 15.393734346359064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://github.com/dannis97500/CP_Prompt.
Related papers
- ID-centric Pre-training for Recommendation [51.72177873832969]
ID embeddings are challenging to be transferred to new domains.
behavioral information in ID embeddings is still verified to be dominating in PLM-based recommendation models.
We propose a novel ID-centric recommendation pre-training paradigm (IDP), which directly transfers informative ID embeddings learned in pre-training domains to item representations in new domains.
arXiv Detail & Related papers (2024-05-06T15:34:31Z) - Towards Cross-Domain Continual Learning [8.22291258264193]
We introduce a novel approach called Cross-Domain Continual Learning (CDCL)
Our method combines inter- and intra-task cross-attention mechanisms within a compact convolutional network.
By leveraging an intra-task-specific pseudo-labeling method, we ensure accurate input pairs for both labeled and unlabeled samples.
arXiv Detail & Related papers (2024-02-19T19:54:03Z) - Learning a Diffusion Model Policy from Rewards via Q-Score Matching [93.0191910132874]
We present a theoretical framework linking the structure of diffusion model policies to a learned Q-function.
We propose a new policy update method from this theory, which we denote Q-score matching.
arXiv Detail & Related papers (2023-12-18T23:31:01Z) - MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental
Learning [12.737883740101438]
We present a novel DIL approach based on a mixture of prompt-tuned CLIP models (MoP-CLIP)
At the training stage we model the features distribution of every class in each domain, learning individual text and visual prompts to adapt to a given domain.
At inference, the learned distributions allow us to identify whether a given test sample belongs to a known domain, selecting the correct prompt for the classification task, or from an unseen domain.
arXiv Detail & Related papers (2023-07-11T18:17:50Z) - Deeply Coupled Cross-Modal Prompt Learning [25.813769028565567]
We propose a Deeply coupled Cross-modal Prompt learning (DCP) method based on CLIP.
DCP flexibly accommodates the interplay between vision and language with a Cross-Modal Prompt Attention (CMPA) mechanism.
We then conduct comprehensive few-shot learning experiments on 11 image classification datasets and analyze the adaption to domain shift as well.
arXiv Detail & Related papers (2023-05-29T06:26:52Z) - SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for
Classification in Low-Resource Domains [14.096170976149521]
SwitchPrompt is a novel and lightweight prompting methodology for adaptation of language models trained on datasets from the general domain to diverse low-resource domains.
Our few-shot experiments on three text classification benchmarks demonstrate the efficacy of the general-domain pre-trained language models when used with SwitchPrompt.
They often even outperform their domain-specific counterparts trained with baseline state-of-the-art prompting methods by up to 10.7% performance increase in accuracy.
arXiv Detail & Related papers (2023-02-14T07:14:08Z) - CLIP-Driven Fine-grained Text-Image Person Re-identification [50.94827165464813]
TIReID aims to retrieve the image corresponding to the given text query from a pool of candidate images.
We propose a CLIP-driven Fine-grained information excavation framework (CFine) to fully utilize the powerful knowledge of CLIP for TIReID.
arXiv Detail & Related papers (2022-10-19T03:43:12Z) - Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt [71.77504700496004]
Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts.
To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts.
However, how and what prompts can improve inference performance remains unclear.
arXiv Detail & Related papers (2022-05-23T07:51:15Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - Learning to Combine: Knowledge Aggregation for Multi-Source Domain
Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework.
In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.