Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models
- URL: http://arxiv.org/abs/2406.18868v2
- Date: Mon, 28 Oct 2024 09:21:35 GMT
- Title: Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models
- Authors: Yicheng Xu, Yuxin Chen, Jiahao Nie, Yusong Wang, Huiping Zhuang, Manabu Okumura,
- Abstract summary: RAIL is a regression-based adapter to learn from a sequence of domains in a non-forgetting manner.
It preserves the VLM's zero-shot ability on unseen domains without any reference data.
Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings.
- Score: 24.22859657019636
- License:
- Abstract: Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code is released at https://github.com/linghan1997/Regression-based-Analytic-Incremental-Learning.
Related papers
- Prompt-based Visual Alignment for Zero-shot Policy Transfer [35.784936617675896]
Overfitting in reinforcement learning has become one of the main obstacles to applications in reinforcement learning.
We propose prompt-based visual alignment (PVA) to mitigate the detrimental domain bias in the image for zero-shot policy transfer.
We verify PVA on a vision-based autonomous driving task with CARLA simulator.
arXiv Detail & Related papers (2024-06-05T13:26:30Z) - Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation [66.72195610471624]
Cross-Domain Sequential Recommendation aims to mine and transfer users' sequential preferences across different domains.
We propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach.
arXiv Detail & Related papers (2024-06-05T09:19:54Z) - CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning [38.063942750061585]
We introduce a novel approach, CoLeCLIP, that learns an open-domain CL model based on CLIP.
CoLeCLIP outperforms state-of-the-art methods for open-domain CL under both task- and class-incremental learning settings.
arXiv Detail & Related papers (2024-03-15T12:28:21Z) - Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context
Learning [48.22913073217633]
Large language models (LLMs) have showcased their capability with few-shot inference known as in-context learning.
In this paper, we study the UDA problem under an in-context learning setting to adapt language models from the source domain to the target domain without any target labels.
We devise different prompting and training strategies, accounting for different LM architectures to learn the target distribution via language modeling.
arXiv Detail & Related papers (2023-11-20T06:06:20Z) - CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples.
Existing methods in video action recognition rely on large labeled datasets from the same domain.
We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z) - AD-CLIP: Adapting Domains in Prompt Space Using CLIP [11.836764044083257]
We introduce textscAD-CLIP, a domain-agnostic prompt learning strategy for CLIP.
Our prompts are designed to be domain-invariant and class-generalizable, by conditioning prompt learning on image style and content features simultaneously.
Our experiments on three benchmark DA datasets demonstrate the effectiveness of textscAD-CLIP compared to existing literature.
arXiv Detail & Related papers (2023-08-10T15:58:28Z) - Adversarial Feature Augmentation for Cross-domain Few-shot
Classification [2.68796389443975]
We propose a novel adversarial feature augmentation (AFA) method to bridge the domain gap in few-shot learning.
The proposed method is a plug-and-play module that can be easily integrated into existing few-shot learning methods.
arXiv Detail & Related papers (2022-08-23T15:10:22Z) - Forget Less, Count Better: A Domain-Incremental Self-Distillation
Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains.
Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z) - Cross-domain Contrastive Learning for Unsupervised Domain Adaptation [108.63914324182984]
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.
We build upon contrastive self-supervised learning to align features so as to reduce the domain discrepancy between training and testing sets.
arXiv Detail & Related papers (2021-06-10T06:32:30Z) - Prototypical Cross-domain Self-supervised Learning for Few-shot
Unsupervised Domain Adaptation [91.58443042554903]
We propose an end-to-end Prototypical Cross-domain Self-Supervised Learning (PCS) framework for Few-shot Unsupervised Domain Adaptation (FUDA)
PCS not only performs cross-domain low-level feature alignment, but it also encodes and aligns semantic structures in the shared embedding space across domains.
Compared with state-of-the-art methods, PCS improves the mean classification accuracy over different domain pairs on FUDA by 10.5%, 3.5%, 9.0%, and 13.2% on Office, Office-Home, VisDA-2017, and DomainNet, respectively.
arXiv Detail & Related papers (2021-03-31T02:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.