Low-Rank Few-Shot Adaptation of Vision-Language Models
- URL: http://arxiv.org/abs/2405.18541v2
- Date: Sat, 1 Jun 2024 20:53:23 GMT
- Title: Low-Rank Few-Shot Adaptation of Vision-Language Models
- Authors: Maxime Zanella, Ismail Ben Ayed,
- Abstract summary: We introduce Low-Rank Adaptation (LoRA) in few-shot learning for Vision-Language Models (VLMs)
Surprisingly, our simple CLIP-LoRA method exhibits substantial improvements, while reducing the training times.
Our results do not dismiss the potential of prompt-learning and adapter-based research.
- Score: 13.803180972839213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, already quite abundant few-shot literature has focused principally on prompt learning and, to a lesser extent, on adapters, overlooking the recent advances in Parameter-Efficient Fine-Tuning (PEFT). Furthermore, existing few-shot learning methods for VLMs often rely on heavy training procedures and/or carefully chosen, task-specific hyper-parameters, which might impede their applicability. In response, we introduce Low-Rank Adaptation (LoRA) in few-shot learning for VLMs, and show its potential on 11 datasets, in comparison to current state-of-the-art prompt- and adapter-based approaches. Surprisingly, our simple CLIP-LoRA method exhibits substantial improvements, while reducing the training times and keeping the same hyper-parameters in all the target tasks, i.e., across all the datasets and numbers of shots. Certainly, our surprising results do not dismiss the potential of prompt-learning and adapter-based research. However, we believe that our strong baseline could be used to evaluate progress in these emergent subjects in few-shot VLMs.
Related papers
- Active Prompt Learning with Vision-Language Model Priors [9.173468790066956]
We introduce a class-guided clustering that leverages the pre-trained image and text encoders of vision-language models.
We propose a budget-saving selective querying based on adaptive class-wise thresholds.
arXiv Detail & Related papers (2024-11-23T02:34:33Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - RAVEN: Multitask Retrieval Augmented Vision-Language Learning [5.1583788731239455]
The scaling of large language models to encode all the world's knowledge is unsustainable and has exacerbated resource barriers.
Retrieval-Augmented Generation (RAG) presents a potential solution, yet its application to vision-language models (VLMs) is under explored.
This paper introduces RAVEN, a retrieval augmented VLM framework that enhances base VLMs through efficient, task specific fine-tuning.
arXiv Detail & Related papers (2024-06-27T13:08:35Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Enhancing Vision-Language Few-Shot Adaptation with Negative Learning [11.545127156146368]
We propose a Simple yet effective Negative Learning approach, SimNL, to more efficiently exploit task-specific knowledge.
To this issue, we introduce a plug-and-play few-shot instance reweighting technique to mitigate noisy outliers.
Our extensive experimental results validate that the proposed SimNL outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks.
arXiv Detail & Related papers (2024-03-19T17:59:39Z) - Low-Rank Adaptation for Multilingual Summarization: An Empirical Study [60.541168233698194]
We investigate the potential of.
Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA) in the domain of multilingual summarization.
We conduct an extensive study across different data availability scenarios, including high- and low-data settings, and cross-lingual transfer.
Our findings reveal that LoRA is competitive with full fine-tuning when trained with high quantities of data, and excels in low-data scenarios and cross-lingual transfer.
arXiv Detail & Related papers (2023-11-14T22:32:39Z) - Meta-Adapter: An Online Few-shot Learner for Vision-Language Model [64.21017759533474]
Contrastive vision-language pre-training, known as CLIP, demonstrates remarkable potential in perceiving open-world visual concepts.
Few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples.
We propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner.
arXiv Detail & Related papers (2023-11-07T07:27:16Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Tuning Language Models as Training Data Generators for
Augmentation-Enhanced Few-Shot Learning [30.65315081964461]
We study few-shot learning with pretrained language models (PLMs) from a different perspective.
We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples.
Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods.
arXiv Detail & Related papers (2022-11-06T06:46:47Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.