Related papers: Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net

Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net

URL: http://arxiv.org/abs/2108.07851v6
Date: Mon, 21 Aug 2023 05:47:52 GMT
Title: Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net
Authors: Yu Qiu, Yun Liu, Le Zhang, Jing Xu
Abstract summary: Existing salient object detection (SOD) methods mainly rely on U-shaped convolution neural networks (CNNs) with skip connections. We propose a transformer-based Asymmetric Bilateral U-Net (ABiU-Net) to learn both global and local representations for SOD. ABiU-Net performs favorably against previous state-of-the-art SOD methods.
Score: 19.21709807149165
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Existing salient object detection (SOD) methods mainly rely on U-shaped convolution neural networks (CNNs) with skip connections to combine the global contexts and local spatial details that are crucial for locating salient objects and refining object details, respectively. Despite great successes, the ability of CNNs in learning global contexts is limited. Recently, the vision transformer has achieved revolutionary progress in computer vision owing to its powerful modeling of global dependencies. However, directly applying the transformer to SOD is suboptimal because the transformer lacks the ability to learn local spatial representations. To this end, this paper explores the combination of transformers and CNNs to learn both global and local representations for SOD. We propose a transformer-based Asymmetric Bilateral U-Net (ABiU-Net). The asymmetric bilateral encoder has a transformer path and a lightweight CNN path, where the two paths communicate at each encoder stage to learn complementary global contexts and local spatial details, respectively. The asymmetric bilateral decoder also consists of two paths to process features from the transformer and CNN encoder paths, with communication at each decoder stage for decoding coarse salient object locations and fine-grained object details, respectively. Such communication between the two encoder/decoder paths enables AbiU-Net to learn complementary global and local representations, taking advantage of the natural merits of transformers and CNNs, respectively. Hence, ABiU-Net provides a new perspective for transformer-based SOD. Extensive experiments demonstrate that ABiU-Net performs favorably against previous state-of-the-art SOD methods. The code is available at https://github.com/yuqiuyuqiu/ABiU-Net.

Related papers

CSHNet: A Novel Information Asymmetric Image Translation Method [57.22010952287759]
We propose the CNN-Swin Hybrid Network (CSHNet), which combines two key modules: Swin Embedded CNN (SEC) and CNN Embedded Swin (CES) CSHNet outperforms existing methods in both visual quality and performance metrics across scene-level and instance-level datasets.
arXiv Detail & Related papers (2025-01-17T13:44:54Z)
Interaction-Guided Two-Branch Image Dehazing Network [1.26404863283601]
Image dehazing aims to restore clean images from hazy ones. CNNs and Transformers have demonstrated exceptional performance in local and global feature extraction. We propose a novel dual-branch image dehazing framework that guides CNN and Transformer components interactively.
arXiv Detail & Related papers (2024-10-14T03:21:56Z)
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions. Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks. We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z)
CompletionFormer: Depth Completion with Convolutions and Vision Transformers [0.0]
This paper proposes a Joint Convolutional Attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure. Our CompletionFormer outperforms state-of-the-art CNNs-based methods on the outdoor KITTI Depth Completion benchmark and indoor NYUv2 dataset, achieving significantly higher efficiency (nearly 1/3 FLOPs) compared to pure Transformer-based methods.
arXiv Detail & Related papers (2023-04-25T17:59:47Z)
ConvFormer: Combining CNN and Transformer for Medical Image Segmentation [17.88894109620463]
We propose a hierarchical CNN and Transformer hybrid architecture, called ConvFormer, for medical image segmentation. Our ConvFormer, trained from scratch, outperforms various CNN- or Transformer-based architectures, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-11-15T23:11:22Z)
Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture. Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context. Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z)
SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection [12.126413875108993]
We propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection. The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets.
arXiv Detail & Related papers (2022-04-12T07:37:39Z)
EDTER: Edge Detection with Transformer [71.83960813880843]
We propose a novel transformer-based edge detector, emphEdge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges. EDTER exploits the full image context information and detailed local cues simultaneously. Experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.
arXiv Detail & Related papers (2022-03-16T11:55:55Z)
Unifying Global-Local Representations in Salient Object Detection with Transformer [55.23033277636774]
We introduce a new attention-based encoder, vision transformer, into salient object detection. With the global view in very shallow layers, the transformer encoder preserves more local representations. Our method significantly outperforms other FCN-based and transformer-based methods in five benchmarks.
arXiv Detail & Related papers (2021-08-05T17:51:32Z)
Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations. We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z)
Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation. tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z)
LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects. We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.