TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer
Embedding Network
- URL: http://arxiv.org/abs/2108.03990v1
- Date: Mon, 9 Aug 2021 12:42:56 GMT
- Title: TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer
Embedding Network
- Authors: Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, Bin Tang
- Abstract summary: We propose a triplet transformer embedding module to enhance multi-level features.
It is the first to use three transformer encoders with shared weights to enhance multi-level features.
The proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection.
- Score: 18.910883028990998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Salient object detection is the pixel-level dense prediction task which can
highlight the prominent object in the scene. Recently U-Net framework is widely
used, and continuous convolution and pooling operations generate multi-level
features which are complementary with each other. In view of the more
contribution of high-level features for the performance, we propose a triplet
transformer embedding module to enhance them by learning long-range
dependencies across layers. It is the first to use three transformer encoders
with shared weights to enhance multi-level features. By further designing scale
adjustment module to process the input, devising three-stream decoder to
process the output and attaching depth features to color features for the
multi-modal fusion, the proposed triplet transformer embedding network
(TriTransNet) achieves the state-of-the-art performance in RGB-D salient object
detection, and pushes the performance to a new level. Experimental results
demonstrate the effectiveness of the proposed modules and the competition of
TriTransNet.
Related papers
- CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient
object detection [12.126413875108993]
We propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection.
The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets.
arXiv Detail & Related papers (2022-04-12T07:37:39Z) - GroupTransNet: Group Transformer Network for RGB-D Salient Object
Detection [5.876499671899904]
We propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection.
GroupTransNet is good at learning the long-range dependencies of cross layer features.
Experiments demonstrate that GroupTransNet outperforms comparison models.
arXiv Detail & Related papers (2022-03-21T08:00:16Z) - DFTR: Depth-supervised Hierarchical Feature Fusion Transformer for
Salient Object Detection [44.94166578314837]
We propose a pure Transformer-based SOD framework, namely Depth-supervised hierarchical feature Fusion TRansformer (DFTR)
We extensively evaluate the proposed DFTR on ten benchmarking datasets. Experimental results show that our DFTR consistently outperforms the existing state-of-the-art methods for both RGB and RGB-D SOD tasks.
arXiv Detail & Related papers (2022-03-12T12:59:12Z) - TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D
Salient Object Detection [86.94578023985677]
In this work, we rethink this task from the perspective of global information alignment and transformation.
Specifically, the proposed method (TransCMD) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path.
Experimental results on seven RGB-D SOD benchmark datasets demonstrate that a simple two-stream encoder-decoder framework can surpass the state-of-the-art purely CNN-based methods.
arXiv Detail & Related papers (2021-12-04T15:45:34Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.