Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization
- URL: http://arxiv.org/abs/2212.09068v2
- Date: Fri, 24 Nov 2023 15:14:26 GMT
- Title: Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization
- Authors: Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee
- Abstract summary: We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
- Score: 113.03189252044773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain shift widely exists in the visual world, while modern deep neural
networks commonly suffer from severe performance degradation under domain shift
due to the poor generalization ability, which limits the real-world
applications. The domain shift mainly lies in the limited source environmental
variations and the large distribution gap between source and unseen target
data. To this end, we propose a unified framework, Style-HAllucinated Dual
consistEncy learning (SHADE), to handle such domain shift in various visual
tasks. Specifically, SHADE is constructed based on two consistency constraints,
Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the
source situations and encourages the model to learn consistent representation
across style-diversified samples. RC leverages general visual knowledge to
prevent the model from overfitting to source data and thus largely keeps the
representation consistent between the source and general visual models.
Furthermore, we present a novel style hallucination module (SHM) to generate
style-diversified samples that are essential to consistency learning. SHM
selects basis styles from the source distribution, enabling the model to
dynamically generate diverse and realistic samples during training. Extensive
experiments demonstrate that our versatile SHADE can significantly enhance the
generalization in various visual recognition tasks, including image
classification, semantic segmentation and object detection, with different
models, i.e., ConvNets and Transformer.
Related papers
- Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification [57.945437355714155]
Cross-scene image classification aims to transfer prior knowledge of ground materials to annotate regions with different distributions.
Existing approaches focus on single-source domain generalization to unseen target domains.
We propose a novel multi-source collaborative domain generalization framework (MS-CDG) based on homogeneity and heterogeneity characteristics of multi-source remote sensing data.
arXiv Detail & Related papers (2024-12-05T06:15:08Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - FDS: Feedback-guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization [19.0284321951354]
Domain Generalization techniques aim to enhance model robustness by simulating novel data distributions during training.
We propose FDS, Feedback-guided Domain Synthesis, a novel strategy that employs diffusion models to synthesize novel, pseudo-domains.
Our evaluations demonstrate that this methodology sets new benchmarks in domain generalization performance across a range of challenging datasets.
arXiv Detail & Related papers (2024-07-04T02:45:29Z) - Can Generative Models Improve Self-Supervised Representation Learning? [0.7999703756441756]
We introduce a framework that enriches the self-supervised learning (SSL) paradigm by utilizing generative models to produce semantically consistent image augmentations.
Our results show that our framework significantly enhances the quality of learned visual representations by up to 10% Top-1 accuracy in downstream tasks.
arXiv Detail & Related papers (2024-03-09T17:17:07Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Federated Domain Generalization for Image Recognition via Cross-Client
Style Transfer [60.70102634957392]
Domain generalization (DG) has been a hot topic in image recognition, with a goal to train a general model that can perform well on unseen domains.
In this paper, we propose a novel domain generalization method for image recognition through cross-client style transfer (CCST) without exchanging data samples.
Our method outperforms recent SOTA DG methods on two DG benchmarks (PACS, OfficeHome) and a large-scale medical image dataset (Camelyon17) in the FL setting.
arXiv Detail & Related papers (2022-10-03T13:15:55Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.