DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
- URL: http://arxiv.org/abs/2408.08855v1
- Date: Fri, 16 Aug 2024 17:30:27 GMT
- Title: DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
- Authors: Eman Ali, Sathira Silva, Muhammad Haris Khan,
- Abstract summary: This study introduces DPA, an unsupervised domain adaptation method for vision-textual models.
It introduces the concept of dual prototypes, acting as distinct classifiers, along with the convex combination of their outputs.
It ranks pseudo-labels to facilitate robust self-training, particularly during early training.
Experiments on 13 downstream vision tasks demonstrate that DPA significantly outperforms zero-shot CLIP and the state-of-the-art unsupervised adaptation baselines.
- Score: 7.649900082537232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-language models (VLMs), e.g., CLIP, have shown remarkable potential in zero-shot image classification. However, adapting these models to new domains remains challenging, especially in unsupervised settings where labelled data is unavailable. Recent research has proposed pseudo-labelling approaches to adapt CLIP in an unsupervised manner using unlabelled target data. Nonetheless, these methods struggle due to noisy pseudo-labels resulting from the misalignment between CLIP's visual and textual representations. This study introduces DPA, an unsupervised domain adaptation method for VLMs. DPA introduces the concept of dual prototypes, acting as distinct classifiers, along with the convex combination of their outputs, thereby leading to accurate pseudo-label construction. Next, it ranks pseudo-labels to facilitate robust self-training, particularly during early training. Finally, it addresses visual-textual misalignment by aligning textual prototypes with image prototypes to further improve the adaptation performance. Experiments on 13 downstream vision tasks demonstrate that DPA significantly outperforms zero-shot CLIP and the state-of-the-art unsupervised adaptation baselines.
Related papers
- UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models [75.77651291095565]
We leverage unlabeled data that naturally spans multiple domains to enhance the transferability of vision-language models.
Under this unsupervised multi-domain setting, we have identified inherent model bias within CLIP.
We propose Unsupervised Multi-domain Feature (UMFC) to mitigate this model bias.
arXiv Detail & Related papers (2024-11-11T12:25:02Z) - Fast One-Stage Unsupervised Domain Adaptive Person Search [17.164485293539833]
Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations.
We propose a Fast One-stage Unsupervised person Search (FOUS) which integrates complementary domain adaptaion with label adaptaion.
FOUS can achieve the state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU and PRW.
arXiv Detail & Related papers (2024-05-05T07:15:47Z) - Anomaly Detection by Adapting a pre-trained Vision Language Model [48.225404732089515]
We present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model.
We introduce the learnable prompt and propose to associate it with abnormal patterns through self-supervised learning.
We achieve the state-of-the-art 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA for anomaly detection and localization.
arXiv Detail & Related papers (2024-03-14T15:35:07Z) - Unsupervised Prototype Adapter for Vision-Language Models [29.516767588241724]
We design an unsupervised fine-tuning approach for vision-language models called Unsupervised Prototype Adapter (UP-Adapter)
Specifically, for the unannotated target datasets, we leverage the text-image aligning capability of CLIP to automatically select the most confident samples for each class.
After fine-tuning, the prototype model prediction is combined with the original CLIP's prediction by a residual connection to perform downstream recognition tasks.
arXiv Detail & Related papers (2023-08-22T15:28:49Z) - UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision
Transformer for Face Forgery Detection [52.91782218300844]
We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT.
Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
arXiv Detail & Related papers (2022-10-23T15:24:47Z) - Semi-Supervised Domain Adaptation with Prototypical Alignment and
Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled.
To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks.
Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z) - Two-phase Pseudo Label Densification for Self-training based Domain
Adaptation [93.03265290594278]
We propose a novel Two-phase Pseudo Label Densification framework, referred to as TPLD.
In the first phase, we use sliding window voting to propagate the confident predictions, utilizing intrinsic spatial-correlations in the images.
In the second phase, we perform a confidence-based easy-hard classification.
To ease the training process and avoid noisy predictions, we introduce the bootstrapping mechanism to the original self-training loss.
arXiv Detail & Related papers (2020-12-09T02:35:25Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.