Related papers: Generalizing Vision-Language Models with Dedicated Prompt Guidance

Generalizing Vision-Language Models with Dedicated Prompt Guidance

URL: http://arxiv.org/abs/2512.02421v1
Date: Tue, 02 Dec 2025 05:06:17 GMT
Title: Generalizing Vision-Language Models with Dedicated Prompt Guidance
Authors: Xinyao Li, Yinjie Min, Hongbo Chen, Zhekai Du, Fengling Li, Jingjing Li,
Abstract summary: We provide a theoretical understanding of the generalization ability for VLM fine-tuning.<n>We propose a two-step domain-expert-Guided DG (GuiDG) framework.<n>GuiDG first employs prompt tuning to obtain source domain experts, then introduces a Cross-Modal Attention module to guide the fine-tuning of the vision encoder.
Score: 21.54643227523398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning large pretrained vision-language models (VLMs) has emerged as a prevalent paradigm for downstream adaptation, yet it faces a critical trade-off between domain specificity and domain generalization (DG) ability. Current methods typically fine-tune a universal model on the entire dataset, which potentially compromises the ability to generalize to unseen domains. To fill this gap, we provide a theoretical understanding of the generalization ability for VLM fine-tuning, which reveals that training multiple parameter-efficient expert models on partitioned source domains leads to better generalization than fine-tuning a universal model. Inspired by this finding, we propose a two-step domain-expert-Guided DG (GuiDG) framework. GuiDG first employs prompt tuning to obtain source domain experts, then introduces a Cross-Modal Attention module to guide the fine-tuning of the vision encoder via adaptive expert integration. To better evaluate few-shot DG, we construct ImageNet-DG from ImageNet and its variants. Extensive experiments on standard DG benchmarks and ImageNet-DG demonstrate that GuiDG improves upon state-of-the-art fine-tuning methods while maintaining efficiency.

Related papers

Multi-Granularity Feature Calibration via VFM for Domain Generalized Semantic Segmentation [15.35795137118814]
Domain Generalized Semantic (DGSS) aims to improve the generalization ability of models across unseen domains without access to target data during training.<n>Recent advances in DGSS have increasingly exploited vision foundation models (VFMs) via parameter-efficient fine-tuning strategies.<n>We propose Multi-Granularity Feature (MGFC), a novel framework that performs coarse-to-fine alignment of VFM features to enhance robustness under domain shifts.
arXiv Detail & Related papers (2025-08-05T02:24:31Z)
LFME: A Simple Framework for Learning from Multiple Experts in Domain Generalization [61.16890890570814]
Domain generalization (DG) methods aim to maintain good performance in an unseen target domain by using training data from multiple source domains. This work introduces a simple yet effective framework, dubbed learning from multiple experts (LFME) that aims to make the target model an expert in all source domains to improve DG.
arXiv Detail & Related papers (2024-10-22T13:44:10Z)
HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization [69.33162366130887]
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. We introduce a novel method designed to supplement the model with domain-level and task-specific characteristics. This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization.
arXiv Detail & Related papers (2024-01-18T04:23:21Z)
Flatness-Aware Minimization for Domain Generalization [17.430563368226853]
Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default for all datasets. We propose Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG.
arXiv Detail & Related papers (2023-07-20T05:48:20Z)
Improving Generalization with Domain Convex Game [32.07275105040802]
Domain generalization tends to alleviate the poor generalization capability of deep neural networks by learning model with multiple source domains. A classical solution to DG is domain augmentation, the common belief of which is that diversifying source domains will be conducive to the out-of-distribution generalization. Our explorations reveal that the correlation between model generalization and the diversity of domains may be not strictly positive, which limits the effectiveness of domain augmentation.
arXiv Detail & Related papers (2023-03-23T14:27:49Z)
Federated Domain Generalization for Image Recognition via Cross-Client Style Transfer [60.70102634957392]
Domain generalization (DG) has been a hot topic in image recognition, with a goal to train a general model that can perform well on unseen domains. In this paper, we propose a novel domain generalization method for image recognition through cross-client style transfer (CCST) without exchanging data samples. Our method outperforms recent SOTA DG methods on two DG benchmarks (PACS, OfficeHome) and a large-scale medical image dataset (Camelyon17) in the FL setting.
arXiv Detail & Related papers (2022-10-03T13:15:55Z)
Compound Domain Generalization via Meta-Knowledge Encoding [55.22920476224671]
We introduce Style-induced Domain-specific Normalization (SDNorm) to re-normalize the multi-modal underlying distributions. We harness the prototype representations, the centroids of classes, to perform relational modeling in the embedding space. Experiments on four standard Domain Generalization benchmarks reveal that COMEN exceeds the state-of-the-art performance without the need of domain supervision.
arXiv Detail & Related papers (2022-03-24T11:54:59Z)
More is Better: A Novel Multi-view Framework for Domain Generalization [28.12350681444117]
Key issue of domain generalization (DG) is how to prevent overfitting to the observed source domains. By treating tasks and images as different views, we propose a novel multi-view DG framework. In test stage, to alleviate unstable prediction, we utilize multiple augmented images to yield multi-view prediction.
arXiv Detail & Related papers (2021-12-23T02:51:35Z)
Unsupervised Domain Generalization for Person Re-identification: A Domain-specific Adaptive Framework [50.88463458896428]
Domain generalization (DG) has attracted much attention in person re-identification (ReID) recently. Existing methods usually need the source domains to be labeled, which could be a significant burden for practical ReID tasks. We propose a simple and efficient domain-specific adaptive framework, and realize it with an adaptive normalization module.
arXiv Detail & Related papers (2021-11-30T02:35:51Z)
Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data. We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG) Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z)
HCDG: A Hierarchical Consistency Framework for Domain Generalization on Medical Image Segmentation [33.623948922908184]
We present a novel Hierarchical Consistency framework for Domain Generalization (HCDG) For the Extrinsic Consistency, we leverage the knowledge across multiple source domains to enforce data-level consistency. For the Intrinsic Consistency, we perform task-level consistency for the same instance under the dual-task scenario.
arXiv Detail & Related papers (2021-09-13T07:07:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.