Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic
Segmentation
- URL: http://arxiv.org/abs/2203.07988v1
- Date: Tue, 15 Mar 2022 15:20:30 GMT
- Title: Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic
Segmentation
- Authors: Runfa Chen, Yu Rong, Shangmin Guo, Jiaqi Han, Fuchun Sun, Tingyang Xu,
Wenbing Huang
- Abstract summary: We find that straightforwardly applying local ViTs in domain adaptive semantic segmentation does not bring in expected improvement.
These high-frequency components make the training of local ViTs very unsmooth and hurt their transferability.
In this paper, we introduce a low-pass filtering mechanism, momentum network, to smooth the learning dynamics of target domain features and pseudo labels.
- Score: 48.7190017311309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: After the great success of Vision Transformer variants (ViTs) in computer
vision, it has also demonstrated great potential in domain adaptive semantic
segmentation. Unfortunately, straightforwardly applying local ViTs in domain
adaptive semantic segmentation does not bring in expected improvement. We find
that the pitfall of local ViTs is due to the severe high-frequency components
generated during both the pseudo-label construction and features alignment for
target domains. These high-frequency components make the training of local ViTs
very unsmooth and hurt their transferability. In this paper, we introduce a
low-pass filtering mechanism, momentum network, to smooth the learning dynamics
of target domain features and pseudo labels. Furthermore, we propose a dynamic
of discrepancy measurement to align the distributions in the source and target
domains via dynamic weights to evaluate the importance of the samples. After
tackling the above issues, extensive experiments on sim2real benchmarks show
that the proposed method outperforms the state-of-the-art methods. Our codes
are available at https://github.com/alpc91/TransDA
Related papers
- Improving Source-Free Target Adaptation with Vision Transformers
Leveraging Domain Representation Images [8.626222763097335]
Unsupervised Domain Adaptation (UDA) methods facilitate knowledge transfer from a labeled source domain to an unlabeled target domain.
This paper presents an innovative method to bolster ViT performance in source-free target adaptation, beginning with an evaluation of how key, query, and value elements affect ViT outcomes.
Domain Representation Images (DRIs) act as domain-specific markers, effortlessly merging with the training regimen.
arXiv Detail & Related papers (2023-11-21T13:26:13Z) - PriViT: Vision Transformers for Fast Private Inference [55.36478271911595]
Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications.
ViTs are ill-suited for private inference using secure multi-party protocols, due to the large number of non-polynomial operations.
We propose PriViT, an algorithm to selectively " Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy.
arXiv Detail & Related papers (2023-10-06T21:45:05Z) - Semantic Segmentation using Vision Transformers: A survey [0.0]
Convolutional neural networks (CNN) and Vision Transformers (ViTs) provide the architecture models for semantic segmentation.
ViTs have proven success in image classification, they cannot be directly applied to dense prediction tasks such as image segmentation and object detection.
This survey aims to review and compare the performances of ViT architectures designed for semantic segmentation using benchmarking datasets.
arXiv Detail & Related papers (2023-05-05T04:11:00Z) - Adaptive Spot-Guided Transformer for Consistent Local Feature Matching [64.30749838423922]
We propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching.
ASTR models the local consistency and scale variations in a unified coarse-to-fine architecture.
arXiv Detail & Related papers (2023-03-29T12:28:01Z) - Exploring Consistency in Cross-Domain Transformer for Domain Adaptive
Semantic Segmentation [51.10389829070684]
Domain gap can cause discrepancies in self-attention.
Due to this gap, the transformer attends to spurious regions or pixels, which deteriorates accuracy on the target domain.
We propose adaptation on attention maps with cross-domain attention layers.
arXiv Detail & Related papers (2022-11-27T02:40:33Z) - Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of
Semantic Segmentation [25.626882426111198]
Unsupervised domain adaptation (UDA) for semantic segmentation has recently gained increasing research attention.
In this paper, we propose a novel two-stage entropy-based UDA method for semantic segmentation.
Our method achieves state-of-the-art 58.4% and 59.6% mIoUs on SYNTHIA-to-Cityscapes and GTA5-to-Cityscapes using DeepLabV2 and competitive performance using the lightweight BiSeNet.
arXiv Detail & Related papers (2022-08-23T03:48:48Z) - Vision Transformers: From Semantic Segmentation to Dense Prediction [139.15562023284187]
We explore the global context learning potentials of vision transformers (ViTs) for dense visual prediction.
Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information.
We formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture.
arXiv Detail & Related papers (2022-07-19T15:49:35Z) - PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation [53.428312630479816]
We observe that the Field of View (FoV) gap induces noticeable instance appearance differences between the source and target domains.
Motivated by the observations, we propose the textbfPosition-Invariant Transform (PIT) to better align images in different domains.
arXiv Detail & Related papers (2021-08-16T15:16:47Z) - Semi-Supervised Domain Adaptation via Adaptive and Progressive Feature
Alignment [32.77436219094282]
SSDAS employs a few labeled target samples as anchors for adaptive and progressive feature alignment between labeled source samples and unlabeled target samples.
In addition, we replace the dissimilar source features by high-confidence target features continuously during the iterative training process.
Extensive experiments show the proposed SSDAS greatly outperforms a number of baselines.
arXiv Detail & Related papers (2021-06-05T09:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.