Related papers: Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification

Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification

URL: http://arxiv.org/abs/2501.15040v1
Date: Sat, 25 Jan 2025 02:55:34 GMT
Title: Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification
Authors: Zhongqi Wang, Jia Dai, Kai Li, Xu Li, Yanmeng Guo, Maosheng Xiang,
Abstract summary: Vision language model (VLM) has been designed for large scale image-text alignment as a pretrained foundation model.<n>Low rank adaptation (LoRA) algorithm has rarely been considered for few shot fine-tuning VLM.<n>We propose the complementary subspace low rank adaptation (Comp-LoRA) method to regularize the catastrophic forgetting problem in few shot VLM finetuning.
Score: 6.801416831975985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision language model (VLM) has been designed for large scale image-text alignment as a pretrained foundation model. For downstream few shot classification tasks, parameter efficient fine-tuning (PEFT) VLM has gained much popularity in the computer vision community. PEFT methods like prompt tuning and linear adapter have been studied for fine-tuning VLM while low rank adaptation (LoRA) algorithm has rarely been considered for few shot fine-tuning VLM. The main obstacle to use LoRA for few shot fine-tuning is the catastrophic forgetting problem. Because the visual language alignment knowledge is important for the generality in few shot learning, whereas low rank adaptation interferes with the most informative direction of the pretrained weight matrix. We propose the complementary subspace low rank adaptation (Comp-LoRA) method to regularize the catastrophic forgetting problem in few shot VLM finetuning. In detail, we optimize the low rank matrix in the complementary subspace, thus preserving the general vision language alignment ability of VLM when learning the novel few shot information. We conduct comparison experiments of the proposed Comp-LoRA method and other PEFT methods on fine-tuning VLM for few shot classification. And we also present the suppression on the catastrophic forgetting problem of our proposed method against directly applying LoRA to VLM. The results show that the proposed method surpasses the baseline method by about +1.0\% Top-1 accuracy and preserves the VLM zero-shot performance over the baseline method by about +1.3\% Top-1 accuracy.

Related papers

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning [48.66442009036754]
Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model fine-tuning.<n>In this work, we re-evaluate four representative LoRA variants alongside vanilla LoRA.<n>We find that different LoRA methods favor distinct learning rate ranges.
arXiv Detail & Related papers (2026-02-04T19:36:20Z)
Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA [50.97792275353563]
We introduce a novel framework that restructures a single Low-Rank Adaptation (LoRA) module as a decomposable Rank-1 Expert Pool.<n>Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [Guided] token.
arXiv Detail & Related papers (2026-01-30T10:54:51Z)
Towards Minimal Fine-Tuning of VLMs [59.01498204407219]
Image-LoRA is a lightweight parameter efficient fine-tuning recipe for transformer-based vision-language models.<n>Image-LoRA applies low-rank adaptation only to the value path of attention layers within the visual-token span.<n>It matches or closely approaches standard LoRA accuracy while using fewer trainable parameters and lower adapter-only training FLOPs.
arXiv Detail & Related papers (2025-12-22T10:02:10Z)
Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches [0.0]
We explore strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints.<n>Two approaches are investigated: (1) attaching a classification head to a pre-trained causal LLM and fine-tuning on the task, and (2) instruction-tuning the LLM in a prompt->response format for classification.
arXiv Detail & Related papers (2025-12-14T13:02:06Z)
One Last Attention for Your Vision-Language Model [42.872184600248914]
We propose textbfRational textbfAdaptaion (RAda) to explicitly exploit the final fused representation during fine-tuning.<n> RAda employs a learned mask, obtained from a lightweight attention layer attached at the end of a VLM, to dynamically calibrate the contribution of each element in the rational matrix.<n>Experiments show that RAda serves as a versatile fine-tuning technique, improving the baseline with minimal code and performing comparably against current arts in most settings.
arXiv Detail & Related papers (2025-07-21T10:35:32Z)
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models [74.40683913645731]
Zero-shot multi-label recognition (MLR) with Vision-Language Models (VLMs) faces significant challenges without training data, model tuning, or architectural modifications. Our work proposes a novel solution treating VLMs as black boxes, leveraging scores without training data or ground truth. Analysis of these prompt scores reveals VLM biases and AND''/OR' signal ambiguities, notably that maximum scores are surprisingly suboptimal compared to second-highest scores.
arXiv Detail & Related papers (2025-02-24T07:15:05Z)
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction [80.67150791183126]
Pre-trained vision-language models (VLMs) have demonstrated impressive zero-shot recognition capability, but still underperform in dense prediction tasks.<n>We propose DenseVLM, a framework designed to learn unbiased region-language alignment from powerful pre-trained VLM representations.<n>We show that DenseVLM can directly replace the original VLM in open-vocabulary object detection and image segmentation methods.
arXiv Detail & Related papers (2024-12-09T06:34:23Z)
Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks. Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z)
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation [13.585425242072173]
Most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA) We propose to improve LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition (SVD) on minibatches of activation. We call our new method $textbfE$xplained $textbfV$ariance $textbfA$daptation (EVA)
arXiv Detail & Related papers (2024-10-09T17:59:06Z)
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix. A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance. We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models [13.660511750245245]
This work introduces Bias-Alleviating Low-Rank Adaptation (BA-LoRA), a novel PEFT method designed to counteract bias inheritance. BA-LoRA incorporates three distinct regularization terms: (1) a consistency regularizer, (2) a diversity regularizer, and (3) a singular value decomposition regularizer. The results demonstrate that BA-LoRA outperforms LoRA and its state-of-the-art variants.
arXiv Detail & Related papers (2024-08-08T16:13:26Z)
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models [4.081098869497239]
We develop state-of-the-art privacy attacks against Large Language Models (LLMs) New membership inference attacks (MIAs) against pretrained LLMs perform hundreds of times better than baseline attacks. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance.
arXiv Detail & Related papers (2024-02-26T20:41:50Z)
DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA) DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z)
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z)
Black Box Few-Shot Adaptation for Vision-Language models [41.49584259596654]
Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. We describe a black-box method for V-L few-shot adaptation that operates on pre-computed image and text features. We propose Linear Feature Alignment (LFA), a simple linear approach for V-L re-alignment in the target domain.
arXiv Detail & Related papers (2023-04-04T12:42:29Z)
Exploring Vision-Language Models for Imbalanced Learning [29.235472353759388]
Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance. Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data.
arXiv Detail & Related papers (2023-04-04T01:56:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.