Towards Complex Backgrounds: A Unified Difference-Aware Decoder for
Binary Segmentation
- URL: http://arxiv.org/abs/2210.15156v1
- Date: Thu, 27 Oct 2022 03:45:29 GMT
- Title: Towards Complex Backgrounds: A Unified Difference-Aware Decoder for
Binary Segmentation
- Authors: Jiepan Li, Wei He, and Hongyan Zhang
- Abstract summary: A new unified dual-branch decoder paradigm named the difference-aware decoder is proposed in this paper.
The difference-aware decoder imitates the human eye in three stages using the multi-level features output by the encoder.
The results demonstrate that the difference-aware decoder can achieve a higher accuracy than the other state-of-the-art binary segmentation methods.
- Score: 4.6932442139663015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary segmentation is used to distinguish objects of interest from
background, and is an active area of convolutional encoder-decoder network
research. The current decoders are designed for specific objects based on the
common backbones as the encoders, but cannot deal with complex backgrounds.
Inspired by the way human eyes detect objects of interest, a new unified
dual-branch decoder paradigm named the difference-aware decoder is proposed in
this paper to explore the difference between the foreground and the background
and separate the objects of interest in optical images. The difference-aware
decoder imitates the human eye in three stages using the multi-level features
output by the encoder. In Stage A, the first branch decoder of the
difference-aware decoder is used to obtain a guide map. The highest-level
features are enhanced with a novel field expansion module and a dual residual
attention module, and are combined with the lowest-level features to obtain the
guide map. In Stage B, the other branch decoder adopts a middle feature fusion
module to make trade-offs between textural details and semantic information and
generate background-aware features. In Stage C, the proposed difference-aware
extractor, consisting of a difference guidance model and a difference
enhancement module, fuses the guide map from Stage A and the background-aware
features from Stage B, to enlarge the differences between the foreground and
the background and output a final detection result. The results demonstrate
that the difference-aware decoder can achieve a higher accuracy than the other
state-of-the-art binary segmentation methods for these tasks.
Related papers
- DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut [62.63481844384229]
Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks.
In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method.
Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks.
arXiv Detail & Related papers (2024-06-05T01:32:31Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - More complex encoder is not all you need [0.882348769487259]
We introduce neU-Net (i.e., not complex encoder U-Net), which incorporates a novel Sub-pixel Convolution for upsampling to construct a powerful decoder.
Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse and ACDC datasets.
arXiv Detail & Related papers (2023-09-20T08:34:38Z) - T-UNet: Triplet UNet for Change Detection in High-Resolution Remote
Sensing Images [5.849243433046327]
Currently, most change detection methods are based on Siamese network structure or early fusion structure.
We propose a novel network, Triplet UNet(T-UNet), based on a three-branch encoder, which is capable to simultaneously extract the object features and the change features.
In the decoder stage, we introduce the channel attention mechanism (CAM) and spatial attention mechanism (SAM) to fully mine and integrate detailed textures information.
arXiv Detail & Related papers (2023-08-04T14:44:11Z) - Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing
Vertical and Horizontal Convolutions [58.71117402626524]
We present a novel double-branch encoder architecture for medical image segmentation.
Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels.
The experiments validate the effectiveness of our model on four datasets.
arXiv Detail & Related papers (2021-07-24T02:58:32Z) - Two-stream Encoder-Decoder Network for Localizing Image Forgeries [4.982505311411925]
We propose a novel two-stream encoder-decoder network, which utilizes both the high-level and the low-level image features.
We have carried out experimental analysis on multiple standard forensics datasets to evaluate the performance of the proposed method.
arXiv Detail & Related papers (2020-09-27T15:49:17Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.