Related papers: GLAD: Generalizable Tuning for Vision-Language Models

GLAD: Generalizable Tuning for Vision-Language Models

URL: http://arxiv.org/abs/2507.13089v1
Date: Thu, 17 Jul 2025 12:58:15 GMT
Title: GLAD: Generalizable Tuning for Vision-Language Models
Authors: Yuqi Peng, Pengfei Wang, Jianzhuang Liu, Shifeng Chen,
Abstract summary: We propose a simpler and more general framework called GLAD (Generalizable LoRA tuning with RegulArized GraDient)<n>We show that merely applying LoRA achieves performance in downstream tasks comparable to current state-of-the-art prompt-based methods.
Score: 41.071911050087586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained vision-language models, such as CLIP, show impressive zero-shot recognition ability and can be easily transferred to specific downstream tasks via prompt tuning, even with limited training data. However, existing prompt tuning methods face two main challenges: (1) In few-shot scenarios, data scarcity often leads to overfitting, making the model sensitive to changes in the input domain. (2) To mitigate overfitting, these methods typically rely on complex task-specific model architectures and sensitive hyperparameter tuning, severely restricting their general applicability. To address these issues, we propose a simpler and more general framework called GLAD (Generalizable LoRA tuning with RegulArized GraDient). We show that merely applying LoRA achieves performance in downstream tasks comparable to current state-of-the-art prompt-based methods. While LoRA is effective and easy to use, it remains susceptible to overfitting in few-shot learning scenarios. To mitigate this risk, we introduce a gradient-based regularization technique. This technique effectively steers the optimization trajectory, encouraging the model to find a more stable parameter region that is robust to variations in data distribution. Through extensive experiments conducted on 15 benchmark datasets, we demonstrate that GLAD outperforms previous tuning approaches in terms of base-to-novel class generalization, image domain generalization, and cross-dataset generalization. The code will be publicly available.

Related papers

Rethinking Range-View LiDAR Segmentation in Adverse Weather [65.22588361803942]
We identify and analyze the unique challenges that affect the generalization of range-view LiDAR segmentation in severe weather.<n>We propose a modular and lightweight framework that enhances robustness without altering the core architecture of existing models.<n>Our approach significantly improves generalization to adverse weather with minimal inference overhead.
arXiv Detail & Related papers (2025-06-10T16:48:27Z)
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.<n>We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z)
Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs [76.40876036912537]
Large Language Models (LLMs) demonstrate strong few-shot adaptability without requiring fine-tuning.<n>Current Visual Foundation Models (VFMs) require explicit fine-tuning with sufficient tuning data.<n>We propose a framework, LoRA Recycle, that distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective.
arXiv Detail & Related papers (2024-12-03T07:25:30Z)
Scale-Invariant Learning-to-Rank [0.0]
At Expedia, learning-to-rank models play a key role in sorting and presenting information more relevant to users.<n>A major challenge in deploying these models is ensuring consistent feature scaling between training and production data.<n>We introduce a scale-invariant LTR framework which combines a deep and a wide neural network to mathematically guarantee scale-invariance in the model at both training and prediction time.<n>We evaluate our framework in simulated real-world scenarios with injected feature scale issues by perturbing the test set at prediction time, and show that even with inconsistent train-test scaling, using framework achieves better performance than
arXiv Detail & Related papers (2024-10-02T19:05:12Z)
Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization [23.78498670529746]
We introduce a regularization technique to ensure that the magnitudes of the extracted features are evenly distributed. Despite its apparent simplicity, our approach has demonstrated significant performance improvements across various fine-grained visual recognition datasets.
arXiv Detail & Related papers (2024-09-03T07:32:46Z)
Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization [1.1534313664323637]
Domain shift is a formidable issue in Machine Learning that causes a model to suffer from performance degradation when tested on unseen domains. FedDG attempts to train a global model using collaborative clients in a privacy-preserving manner that can generalize well to unseen clients possibly with domain shift. Here, we introduce a novel architectural method for FedDG, namely gPerXAN, which relies on a normalization scheme working with a guiding regularizer.
arXiv Detail & Related papers (2024-03-22T20:22:08Z)
Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons" We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets. Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z)
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models [137.74524357614285]
We introduce a novel Gradient-RegulAted Meta-prompt learning framework. It helps pre-training models adapt to downstream tasks in a parameter -- and data -- efficient way. GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way.
arXiv Detail & Related papers (2023-03-12T05:03:37Z)
Adversarial Style Augmentation for Domain Generalization [41.72506801753435]
We introduce a novel Adrial Style Augmentation (ASA) method, which explores broader style spaces by generating more effective statistics perturbation. To facilitate the application of ASA, we design a simple yet effective module, namely AdvStyle, which instantiates the ASA method in a plug-and-play manner. Our method significantly outperforms its competitors on the PACS dataset under the single source generalization setting.
arXiv Detail & Related papers (2023-01-30T03:52:16Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.