Source-Free Domain Adaptation for RGB-D Semantic Segmentation with
Vision Transformers
- URL: http://arxiv.org/abs/2305.14269v2
- Date: Wed, 6 Dec 2023 18:21:17 GMT
- Title: Source-Free Domain Adaptation for RGB-D Semantic Segmentation with
Vision Transformers
- Authors: Giulia Rizzoli, Donald Shenaj, Pietro Zanuttigh
- Abstract summary: We propose MISFIT: MultImodal Source-Free Information fusion Transformer, a depth-aware framework for source-free semantic segmentation.
Our framework, which is also the first approach using RGB-D vision transformers for source-free semantic segmentation, shows noticeable performance improvements.
- Score: 11.13182313760599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing availability of depth sensors, multimodal frameworks that
combine color information with depth data are gaining interest. However, ground
truth data for semantic segmentation is burdensome to provide, thus making
domain adaptation a significant research area. Yet most domain adaptation
methods are not able to effectively handle multimodal data. Specifically, we
address the challenging source-free domain adaptation setting where the
adaptation is performed without reusing source data. We propose MISFIT:
MultImodal Source-Free Information fusion Transformer, a depth-aware framework
which injects depth data into a segmentation module based on vision
transformers at multiple stages, namely at the input, feature and output
levels. Color and depth style transfer helps early-stage domain alignment while
re-wiring self-attention between modalities creates mixed features, allowing
the extraction of better semantic content. Furthermore, a depth-based entropy
minimization strategy is also proposed to adaptively weight regions at
different distances. Our framework, which is also the first approach using
RGB-D vision transformers for source-free semantic segmentation, shows
noticeable performance improvements with respect to standard strategies.
Related papers
- FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background [9.970265640589966]
Existing deep learning approaches leave out the semantic cues that are crucial in semantic segmentation present in complex scenarios.
We propose a feature amplification network (FANet) as a backbone network that incorporates semantic information using a novel feature enhancement module at multi-stages.
Our experimental results demonstrate the state-of-the-art performance compared to existing methods.
arXiv Detail & Related papers (2024-07-12T15:57:52Z) - MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation [2.0293118701268154]
We propose a novel MultiResolution Feature Perturbation (MRFP) technique to randomize domain-specific fine-grained features and perturb style of coarse features.
MRFP helps state-of-the-art deep neural networks to learn robust domain invariant features for simulation-to-real semantic segmentation.
arXiv Detail & Related papers (2023-11-30T08:02:49Z) - DepthFormer: Multimodal Positional Encodings and Cross-Input Attention
for Transformer-Based Segmentation Networks [13.858051019755283]
We focus on transformer-based deep learning architectures, that have achieved state-of-the-art performances on the segmentation task.
We propose to employ depth information by embedding it in the positional encoding.
Our approach consistently improves performances on the Cityscapes benchmark.
arXiv Detail & Related papers (2022-11-08T12:01:31Z) - RAIN: RegulArization on Input and Network for Black-Box Domain
Adaptation [80.03883315743715]
Source-free domain adaptation transits the source-trained model towards target domain without exposing the source data.
This paradigm is still at risk of data leakage due to adversarial attacks on the source model.
We propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization.
arXiv Detail & Related papers (2022-08-22T18:18:47Z) - Unsupervised domain adaptation semantic segmentation of high-resolution
remote sensing imagery with invariant domain-level context memory [10.210120085157161]
This study proposes a novel unsupervised domain adaptation semantic segmentation network (MemoryAdaptNet) for the semantic segmentation of HRS imagery.
MemoryAdaptNet constructs an output space adversarial learning scheme to bridge the domain distribution discrepancy between source domain and target domain.
Experiments under three cross-domain tasks indicate that our proposed MemoryAdaptNet is remarkably superior to the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-16T12:35:57Z) - Learning Feature Decomposition for Domain Adaptive Monocular Depth
Estimation [51.15061013818216]
Supervised approaches have led to great success with the advance of deep learning, but they rely on large quantities of ground-truth depth annotations.
Unsupervised domain adaptation (UDA) transfers knowledge from labeled source data to unlabeled target data, so as to relax the constraint of supervised learning.
We propose a novel UDA method for MDE, referred to as Learning Feature Decomposition for Adaptation (LFDA), which learns to decompose the feature space into content and style components.
arXiv Detail & Related papers (2022-07-30T08:05:35Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Supervised Domain Adaptation using Graph Embedding [86.3361797111839]
Domain adaptation methods assume that distributions between the two domains are shifted and attempt to realign them.
We propose a generic framework based on graph embedding.
We show that the proposed approach leads to a powerful Domain Adaptation framework.
arXiv Detail & Related papers (2020-03-09T12:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.