Related papers: TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

URL: http://arxiv.org/abs/2504.05774v3
Date: Wed, 15 Oct 2025 03:10:49 GMT
Title: TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation
Authors: Enming Zhang, Zhengyu Li, Yanru Wu, Jingge Wang, Yang Tan, Guan Wang, Yang Li, Xiaoping Zhang,
Abstract summary: We propose a region-adaptive framework designed to enhance cross-domain representation learning through transferability guidance.<n>First, we dynamically partition the image into coherent regions, grouped by structural and semantic similarity, and estimates their domain transferability at a localized level.<n>Then, we incorporate region-level transferability maps directly into the self-attention mechanism of ViTs, allowing the model to adaptively focus attention on areas with lower transferability and higher semantic uncertainty.
Score: 27.208145888390117
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Vision Transformers (ViTs) have significantly advanced semantic segmentation performance. However, their adaptation to new target domains remains challenged by distribution shifts, which often disrupt global attention mechanisms. While existing global and patch-level adaptation methods offer some improvements, they overlook the spatially varying transferability inherent in different image regions. To address this, we propose the Transferable Mask Transformer (TMT), a region-adaptive framework designed to enhance cross-domain representation learning through transferability guidance. First, we dynamically partition the image into coherent regions, grouped by structural and semantic similarity, and estimates their domain transferability at a localized level. Then, we incorporate region-level transferability maps directly into the self-attention mechanism of ViTs, allowing the model to adaptively focus attention on areas with lower transferability and higher semantic uncertainty. Extensive experiments across 20 diverse cross-domain settings demonstrate that TMT not only mitigates the performance degradation typically associated with domain shift but also consistently outperforms existing approaches.

Related papers

Open-Vocabulary Domain Generalization in Urban-Scene Segmentation [83.15573353963235]
Domain Generalization in Semantic Domain (DG-SS) aims to enable segmentation models to perform robustly in unseen environments.<n>Recent progress in Vision-Language Models (VLMs) has advanced Open-Vocabulary Semantic (OV-SS) by enabling models to recognize a broader range of concepts.<n>Yet, these models remain sensitive to domain shifts and struggle to maintain robustness when deployed in unseen environments.<n>We propose S2-Corr, a state-space-driven text-image correlation refinement mechanism that produces more consistent text-image correlations under distribution changes.
arXiv Detail & Related papers (2026-02-21T14:32:27Z)
Cross-Domain Transfer with Self-Supervised Spectral-Spatial Modeling for Hyperspectral Image Classification [5.784164305429653]
This paper proposes a self-supervised cross-domain transfer framework.<n>It learns transferable spectral-spatial joint representations without source labels.<n> Experimental results demonstrate stable classification performance and strong cross-domain adaptability.
arXiv Detail & Related papers (2026-01-26T02:52:35Z)
TransAdapter: Vision Transformer for Feature-Centric Unsupervised Domain Adaptation [0.3277163122167433]
Unsupervised Domain Adaptation (UDA) aims to utilize labeled data from a source domain to solve tasks in an unlabeled target domain.<n>Traditional CNN-based methods struggle to fully capture complex domain relationships.<n>We propose a novel UDA approach leveraging the Swin Transformer with three key modules.
arXiv Detail & Related papers (2024-12-05T11:11:39Z)
Exploring Consistency in Cross-Domain Transformer for Domain Adaptive Semantic Segmentation [51.10389829070684]
Domain gap can cause discrepancies in self-attention. Due to this gap, the transformer attends to spurious regions or pixels, which deteriorates accuracy on the target domain. We propose adaptation on attention maps with cross-domain attention layers.
arXiv Detail & Related papers (2022-11-27T02:40:33Z)
UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration [49.16591283724376]
We design UniDAformer, a unified domain adaptive panoptic segmentation transformer that is simple but can achieve domain adaptive instance segmentation and semantic segmentation simultaneously within a single network. UniDAformer introduces Hierarchical Mask (HMC) that rectifies inaccurate predictions at the level of regions, superpixels and annotated pixels via online self-training on the fly. It has three unique features: 1) it enables unified domain adaptive panoptic adaptation; 2) it mitigates false predictions and improves domain adaptive panoptic segmentation effectively; 3) it is end-to-end trainable with a much simpler training and inference pipeline.
arXiv Detail & Related papers (2022-06-30T07:32:23Z)
Variational Transfer Learning using Cross-Domain Latent Modulation [1.9662978733004601]
We introduce a novel cross-domain latent modulation mechanism to a variational autoencoder framework so as to achieve effective transfer learning. Deep representations of the source and target domains are first extracted by a unified inference model and aligned by employing gradient reversal. The learned deep representations are then cross-modulated to the latent encoding of the alternative domain, where consistency constraints are also applied.
arXiv Detail & Related papers (2022-05-31T03:47:08Z)
Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation [48.7190017311309]
We find that straightforwardly applying local ViTs in domain adaptive semantic segmentation does not bring in expected improvement. These high-frequency components make the training of local ViTs very unsmooth and hurt their transferability. In this paper, we introduce a low-pass filtering mechanism, momentum network, to smooth the learning dynamics of target domain features and pseudo labels.
arXiv Detail & Related papers (2022-03-15T15:20:30Z)
Amplitude Spectrum Transformation for Open Compound Domain Adaptive Semantic Segmentation [62.68759523116924]
Open compound domain adaptation (OCDA) has emerged as a practical adaptation setting. We propose a novel feature space Amplitude Spectrum Transformation (AST)
arXiv Detail & Related papers (2022-02-09T05:40:34Z)
Domain Adaptive Semantic Segmentation with Regional Contrastive Consistency Regularization [19.279884432843822]
We propose a novel and fully end-to-end trainable approach, called regional contrastive consistency regularization (RCCR) for domain adaptive semantic segmentation. Our core idea is to pull the similar regional features extracted from the same location of different images to be closer, and meanwhile push the features from the different locations of the two images to be separated.
arXiv Detail & Related papers (2021-10-11T11:45:00Z)
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation [54.61786380919243]
Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge remains unexplored in the literature.
arXiv Detail & Related papers (2021-08-12T22:37:43Z)
AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training. Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z)
Transformer-Based Source-Free Domain Adaptation [134.67078085569017]
We study the task of source-free domain adaptation (SFDA), where the source data are not available during target adaptation. We propose a generic and effective framework based on Transformer, named TransDA, for learning a generalized model for SFDA.
arXiv Detail & Related papers (2021-05-28T23:06:26Z)
Domain Adaptation for Semantic Segmentation via Patch-Wise Contrastive Learning [62.7588467386166]
We leverage contrastive learning to bridge the domain gap by aligning the features of structurally similar label patches across domains. Our approach consistently outperforms state-of-the-art unsupervised and semi-supervised methods on two challenging domain adaptive segmentation tasks.
arXiv Detail & Related papers (2021-04-22T13:39:12Z)
Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation [74.3349233035632]
Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) do not consider an inter-class variation within the target domain itself or estimated category. We introduce a learnable clustering module, and a novel domain adaptation framework called cross-domain grouping and alignment. Our method consistently boosts the adaptation performance in semantic segmentation, outperforming the state-of-the-arts on various domain adaptation settings.
arXiv Detail & Related papers (2020-12-15T11:36:21Z)
Deep Adversarial Transition Learning using Cross-Grafted Generative Stacks [3.756448228784421]
We present a novel "deep adversarial transition learning" (DATL) framework that bridges the domain gap. We construct variational auto-encoders (VAEs) for the two domains, and form bidirectional transitions by cross-grafting the VAEs' decoder stacks. generative adversarial networks (GAN) are employed for domain adaptation, mapping the target domain data to the known label space of the source domain.
arXiv Detail & Related papers (2020-09-25T04:25:27Z)
Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation [44.19436340246248]
This paper presents an innovative local contextual-relation consistent domain adaptation technique. It aims to achieve local-level consistencies during the global-level alignment. Experiments demonstrate its superior segmentation performance as compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-07-05T19:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.