GroupTransNet: Group Transformer Network for RGB-D Salient Object
Detection
- URL: http://arxiv.org/abs/2203.10785v1
- Date: Mon, 21 Mar 2022 08:00:16 GMT
- Title: GroupTransNet: Group Transformer Network for RGB-D Salient Object
Detection
- Authors: Xian Fang, Jinshao Zhu, Xiuli Shao, Hongpeng Wang
- Abstract summary: We propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection.
GroupTransNet is good at learning the long-range dependencies of cross layer features.
Experiments demonstrate that GroupTransNet outperforms comparison models.
- Score: 5.876499671899904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Salient object detection on RGB-D images is an active topic in computer
vision. Although the existing methods have achieved appreciable performance,
there are still some challenges. The locality of convolutional neural network
requires that the model has a sufficiently deep global receptive field, which
always leads to the loss of local details. To address the challenge, we propose
a novel Group Transformer Network (GroupTransNet) for RGB-D salient object
detection. This method is good at learning the long-range dependencies of cross
layer features to promote more perfect feature expression. At the beginning,
the features of the slightly higher classes of the middle three levels and the
latter three levels are soft grouped to absorb the advantages of the high-level
features. The input features are repeatedly purified and enhanced by the
attention mechanism to purify the cross modal features of color modal and depth
modal. The features of the intermediate process are first fused by the features
of different layers, and then processed by several transformers in multiple
groups, which not only makes the size of the features of each scale unified and
interrelated, but also achieves the effect of sharing the weight of the
features within the group. The output features in different groups complete the
clustering staggered by two owing to the level difference, and combine with the
low-level features. Extensive experiments demonstrate that GroupTransNet
outperforms the comparison models and achieves the new state-of-the-art
performance.
Related papers
- Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Semantic Labeling of High Resolution Images Using EfficientUNets and
Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers.
Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient
object detection [12.126413875108993]
We propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection.
The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets.
arXiv Detail & Related papers (2022-04-12T07:37:39Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer
Embedding Network [18.910883028990998]
We propose a triplet transformer embedding module to enhance multi-level features.
It is the first to use three transformer encoders with shared weights to enhance multi-level features.
The proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection.
arXiv Detail & Related papers (2021-08-09T12:42:56Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Bifurcated backbone strategy for RGB-D salient object detection [168.19708737906618]
We leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network.
Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.
arXiv Detail & Related papers (2020-07-06T13:01:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.