Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation
- URL: http://arxiv.org/abs/2503.08906v1
- Date: Tue, 11 Mar 2025 21:38:34 GMT
- Title: Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation
- Authors: Xiwen Chen, Wenhui Zhu, Peijie Qiu, Hao Wang, Huayu Li, Haiyu Wu, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi,
- Abstract summary: Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks.<n>We propose an optimal transport (OT)-guided prompt learning framework that mitigates forgetting by preserving the structural consistency of feature distributions.<n>Our approach enforces joint constraints on both vision and text representations, ensuring a holistic feature alignment.
- Score: 5.296260279593993
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks. Prompt learning has emerged as an efficient and effective strategy to adapt VLMs while preserving their pre-trained knowledge. However, existing methods still lead to overfitting and degrade zero-shot generalization. To address this challenge, we propose an optimal transport (OT)-guided prompt learning framework that mitigates forgetting by preserving the structural consistency of feature distributions between pre-trained and fine-tuned models. Unlike conventional point-wise constraints, OT naturally captures cross-instance relationships and expands the feasible parameter space for prompt tuning, allowing a better trade-off between adaptation and generalization. Our approach enforces joint constraints on both vision and text representations, ensuring a holistic feature alignment. Extensive experiments on benchmark datasets demonstrate that our simple yet effective method can outperform existing prompt learning strategies in base-to-novel generalization, cross-dataset evaluation, and domain generalization without additional augmentation or ensemble techniques. The code is available at https://github.com/ChongQingNoSubway/Prompt-OT
Related papers
- A Similarity Paradigm Through Textual Regularization Without Forgetting [17.251684463032433]
We propose a novel method called Similarity Paradigm with Textual Regularization (SPTR) for prompt learning without forgetting.
SPTR is a two-pronged design based on hand-crafted prompts that is an inseparable framework.
Four representative tasks across 11 datasets demonstrate that SPTR outperforms existing prompt learning methods.
arXiv Detail & Related papers (2025-02-20T09:06:44Z) - Efficient Transfer Learning for Video-language Foundation Models [13.166348605993292]
We propose a parameter-efficient Multi-modalpatio Ssupervised-Temporal Adapter (MSTA) to enhance alignment between textual and visual representations.
We evaluate the effectiveness of our approach across four tasks: zero-shot transfer, few-shot learning, base-to-novel generalization, and fully-Temporal learning.
arXiv Detail & Related papers (2024-11-18T01:25:58Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Prompting classes: Exploring the Power of Prompt Class Learning in
Weakly Supervised Semantic Segmentation [15.467510304266883]
We study the impact of prompt tuning on weakly supervised semantic segmentation.
We introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy.
We demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark.
arXiv Detail & Related papers (2023-06-30T19:25:18Z) - Gradient-Regulated Meta-Prompt Learning for Generalizable
Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework.
It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way.
GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.