CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
- URL: http://arxiv.org/abs/2403.17369v3
- Date: Mon, 15 Jul 2024 06:34:03 GMT
- Title: CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
- Authors: Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xianzheng Ma, Xiangwei Zhu, Zhenming Ji,
- Abstract summary: Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains.
CoDA instructs models to distinguish, focus, and learn from discrepancies at scene and image levels.
CoDA achieves SOTA performances on widely-used benchmarks under all adverse scenes.
- Score: 6.467495914193209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these discrepancies at scene and image levels. Specifically, CoDA consists of a Chain-of-Domain (CoD) strategy and a Severity-Aware Visual Prompt Tuning (SAVPT) mechanism. CoD focuses on scene-level instructions to divide all adverse scenes into easy and hard scenes, guiding models to adapt from source to easy domains with easy scene images, and then to hard domains with hard scene images, thereby laying a solid foundation for whole adaptations. Building upon this foundation, we employ SAVPT to dive into more detailed image-level instructions to boost performance. SAVPT features a novel metric Severity that divides all adverse scene images into low-severity and high-severity images. Then Severity directs visual prompts and adapters, instructing models to concentrate on unified severity features instead of scene-specific features, without adding complexity to the model architecture. CoDA achieves SOTA performances on widely-used benchmarks under all adverse scenes. Notably, CoDA outperforms the existing ones by 4.6%, and 10.3% mIoU on the Foggy Driving, and Foggy Zurich benchmarks, respectively. Our code is available at https://github.com/Cuzyoung/CoDA
Related papers
- PIG: Prompt Images Guidance for Night-Time Scene Parsing [48.35991796324741]
Unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes.
We propose a Night-Focused Network (NFNet) to learn night-specific features from both target domain images and prompt images.
We conduct experiments on four night-time datasets: NightCity, NightCity+, Dark Zurich, and ACDC.
arXiv Detail & Related papers (2024-06-15T07:06:19Z) - Saliency Guided Image Warping for Unsupervised Domain Adaptation [19.144094571994756]
We improve UDA training by using in-place image warping to focus on salient object regions.
We design instance-level saliency guidance to adaptively oversample object regions.
Our approach improves adaptation across geographies, lighting, and weather conditions.
arXiv Detail & Related papers (2024-03-19T13:19:41Z) - GIM: Learning Generalizable Image Matcher From Internet Videos [18.974842517202365]
We propose GIM, a self-training framework for learning a single generalizable model based on any image matching architecture.
We also propose ZEB, the first zero-shot evaluation benchmark for image matching.
arXiv Detail & Related papers (2024-02-16T21:48:17Z) - We're Not Using Videos Effectively: An Updated Domain Adaptive Video
Segmentation Baseline [19.098970392639476]
Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking.
We find that even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods outperform Video-DAS methods on established Video-DAS benchmarks.
arXiv Detail & Related papers (2024-02-01T18:59:56Z) - Learning with Difference Attention for Visually Grounded Self-supervised
Representations [18.743052370916192]
We propose visual difference attention (VDA) to compute visual attention maps in an unsupervised fashion.
We show VA does not highlight all salient regions in an image accurately, suggesting their inability to learn strong representations for downstream tasks like segmentation.
Motivated by these limitations, we propose a new learning objective, Differentiable Difference Attention (DiDA) loss, which leads to substantial improvements in an SSL model's visually grounding to an image's salient regions.
arXiv Detail & Related papers (2023-06-26T11:27:55Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Domain Adaptive and Generalizable Network Architectures and Training
Strategies for Semantic Image Segmentation [108.33885637197614]
Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or unseen target domains.
We propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention.
arXiv Detail & Related papers (2023-04-26T15:18:45Z) - Refign: Align and Refine for Adaptation of Semantic Segmentation to
Adverse Conditions [78.71745819446176]
Refign is a generic extension to self-training-based UDA methods which leverages cross-domain correspondences.
Refign consists of two steps: (1) aligning the normal-condition image to the corresponding adverse-condition image using an uncertainty-aware dense matching network, and (2) refining the adverse prediction with the normal prediction using an adaptive label correction mechanism.
The approach introduces no extra training parameters, minimal computational overhead -- during training only -- and can be used as a drop-in extension to improve any given self-training-based UDA method.
arXiv Detail & Related papers (2022-07-14T11:30:38Z) - DAFormer: Improving Network Architectures and Training Strategies for
Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process.
We propose a novel UDA method, DAFormer, based on the benchmark results.
DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z) - Co-Attention for Conditioned Image Matching [91.43244337264454]
We propose a new approach to determine correspondences between image pairs in the wild under large changes in illumination, viewpoint, context, and material.
While other approaches find correspondences between pairs of images by treating the images independently, we instead condition on both images to implicitly take account of the differences between them.
arXiv Detail & Related papers (2020-07-16T17:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.