Related papers: StylePrompter: Enhancing Domain Generalization with Test-Time Style Priors

StylePrompter: Enhancing Domain Generalization with Test-Time Style Priors

URL: http://arxiv.org/abs/2408.09138v1
Date: Sat, 17 Aug 2024 08:35:43 GMT
Title: StylePrompter: Enhancing Domain Generalization with Test-Time Style Priors
Authors: Jiao Zhang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu,
Abstract summary: In real-world applications, the sample distribution at the inference stage often differs from the one at the training stage. This paper introduces the style prompt in the language modality to adapt the trained model dynamically. In particular, we train a style prompter to extract style information of the current image into an embedding in the token embedding space. Our open space partition of the style token embedding space and the hand-crafted style regularization enable the trained style prompter to handle data from unknown domains effectively.
Score: 39.695604434738186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In real-world applications, the sample distribution at the inference stage often differs from the one at the training stage, causing performance degradation of trained deep models. The research on domain generalization (DG) aims to develop robust algorithms that can improve the generalized performance in unseen domains by training on a few domains. However, the domain-agnostic vision model, trained on a limited number of domains using traditional domain generalization methods, cannot guarantee its effectiveness in dealing with unseen domains. The introduction of language can break the closed cognition space of the vision model, providing additional semantic information that cannot be inferred from vision-only datasets. In this paper, we propose to overcome the challenge in previous DG methods by introducing the style prompt in the language modality to adapt the trained model dynamically. In particular, we train a style prompter to extract style information of the current image into an embedding in the token embedding space and place it in front of the candidate category words as prior knowledge to prompt the model. Our open space partition of the style token embedding space and the hand-crafted style regularization enable the trained style prompter to handle data from unknown domains effectively. Extensive experiments verify the effectiveness of our method and demonstrate state-of-the-art performances on multiple public datasets. Codes will be available after the acceptance of this paper.

Related papers

Text-Driven Causal Representation Learning for Source-Free Domain Generalization [82.75041792888274]
We propose TDCRL, the first method to integrate causal inference into the source-free domain generalization setting.<n>Our approach offers a clear and effective mechanism to achieve robust, domain-invariant features, ensuring robust generalization.
arXiv Detail & Related papers (2025-07-14T06:20:42Z)
Towards Text-free Graph Foundation Models: Rethinking Multi-Domain Graph Contrastive Learning [40.56379624114316]
We propose a novel multi-domain pre-training and cross-domain transfer framework, namely MDGCL.<n>In the pre-training stage, we design a contrastive learning strategy to substantially recognize and capture domain differences.<n>In the downstream stage, we introduce a domain attention mechanism to enable fine-grained domain knowledge transfer.
arXiv Detail & Related papers (2025-06-26T03:14:50Z)
Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff. By integrating pseudo-target domain data with source domain data, we diversify the training dataset. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z)
In the Era of Prompt Learning with Vision-Language Models [1.060608983034705]
We introduce textscStyLIP, a novel domain-agnostic prompt learning strategy for Domain Generalization (DG) StyLIP disentangles visual style and content in CLIPs vision encoder by using style projectors to learn domain-specific prompt tokens. We also propose AD-CLIP for unsupervised domain adaptation (DA), leveraging CLIPs frozen vision backbone.
arXiv Detail & Related papers (2024-11-07T17:31:21Z)
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation. We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart. We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z)
Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms. We propose a textbfDomain-Controlled Prompt Learning for the specific domains. Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z)
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation. We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z)
Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt. Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z)
A Curriculum-style Self-training Approach for Source-Free Semantic Segmentation [91.13472029666312]
We propose a curriculum-style self-training approach for source-free domain adaptive semantic segmentation. Our method yields state-of-the-art performance on source-free semantic segmentation tasks for both synthetic-to-real and adverse conditions.
arXiv Detail & Related papers (2021-06-22T10:21:39Z)
A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks. In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data. To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z)
Generalizable Model-agnostic Semantic Segmentation via Target-specific Normalization [24.14272032117714]
We propose a novel domain generalization framework for the generalizable semantic segmentation task. We exploit the model-agnostic learning to simulate the domain shift problem. Considering the data-distribution discrepancy between seen source and unseen target domains, we develop the target-specific normalization scheme.
arXiv Detail & Related papers (2020-03-27T09:25:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.