Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization
- URL: http://arxiv.org/abs/2212.09068v2
- Date: Fri, 24 Nov 2023 15:14:26 GMT
- Title: Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization
- Authors: Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee
- Abstract summary: We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
- Score: 113.03189252044773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain shift widely exists in the visual world, while modern deep neural
networks commonly suffer from severe performance degradation under domain shift
due to the poor generalization ability, which limits the real-world
applications. The domain shift mainly lies in the limited source environmental
variations and the large distribution gap between source and unseen target
data. To this end, we propose a unified framework, Style-HAllucinated Dual
consistEncy learning (SHADE), to handle such domain shift in various visual
tasks. Specifically, SHADE is constructed based on two consistency constraints,
Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the
source situations and encourages the model to learn consistent representation
across style-diversified samples. RC leverages general visual knowledge to
prevent the model from overfitting to source data and thus largely keeps the
representation consistent between the source and general visual models.
Furthermore, we present a novel style hallucination module (SHM) to generate
style-diversified samples that are essential to consistency learning. SHM
selects basis styles from the source distribution, enabling the model to
dynamically generate diverse and realistic samples during training. Extensive
experiments demonstrate that our versatile SHADE can significantly enhance the
generalization in various visual recognition tasks, including image
classification, semantic segmentation and object detection, with different
models, i.e., ConvNets and Transformer.
Related papers
- Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Can Generative Models Improve Self-Supervised Representation Learning? [0.7999703756441756]
We introduce a novel framework that enriches the self-supervised learning paradigm by utilizing generative models to produce semantically consistent image augmentations.
Our results show that our framework significantly enhances the quality of learned visual representations by up to 10% Top-1 accuracy in downstream tasks.
arXiv Detail & Related papers (2024-03-09T17:17:07Z) - Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain.
With only a single representative text feature instead of real images, the synthesized images gradually lose diversity.
We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Federated Domain Generalization for Image Recognition via Cross-Client
Style Transfer [60.70102634957392]
Domain generalization (DG) has been a hot topic in image recognition, with a goal to train a general model that can perform well on unseen domains.
In this paper, we propose a novel domain generalization method for image recognition through cross-client style transfer (CCST) without exchanging data samples.
Our method outperforms recent SOTA DG methods on two DG benchmarks (PACS, OfficeHome) and a large-scale medical image dataset (Camelyon17) in the FL setting.
arXiv Detail & Related papers (2022-10-03T13:15:55Z) - Style-Hallucinated Dual Consistency Learning for Domain Generalized
Semantic Segmentation [117.3856882511919]
We propose the Style-HAllucinated Dual consistEncy learning (SHADE) framework to handle domain shift.
Our SHADE yields significant improvement and outperforms state-of-the-art methods by 5.07% and 8.35% on the average mIoU of three real-world datasets.
arXiv Detail & Related papers (2022-04-06T02:49:06Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.