UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise
Perspective with Transformer
- URL: http://arxiv.org/abs/2109.04335v1
- Date: Thu, 9 Sep 2021 15:18:20 GMT
- Title: UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise
Perspective with Transformer
- Authors: Haonan Wang, Peng Cao, Jiaqi Wang, Osmar R.Zaiane
- Abstract summary: We propose a new segmentation framework, named UCTransNet, from the channel perspective with attention mechanism.
The proposed connection consisting of the CCT and CCA is able to replace the original skip connection to solve the semantic gaps for an accurate medical image segmentation.
- Score: 12.680709604300038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most recent semantic segmentation methods adopt a U-Net framework with an
encoder-decoder architecture. It is still challenging for U-Net with a simple
skip connection scheme to model the global multi-scale context: 1) Not each
skip connection setting is effective due to the issue of incompatible feature
sets of encoder and decoder stage, even some skip connection negatively
influence the segmentation performance; 2) The original U-Net is worse than the
one without any skip connection on some datasets. Based on our findings, we
propose a new segmentation framework, named UCTransNet (with a proposed CTrans
module in U-Net), from the channel perspective with attention mechanism.
Specifically, the CTrans module is an alternate of the U-Net skip connections,
which consists of a sub-module to conduct the multi-scale Channel Cross fusion
with Transformer (named CCT) and a sub-module Channel-wise Cross-Attention
(named CCA) to guide the fused multi-scale channel-wise information to
effectively connect to the decoder features for eliminating the ambiguity.
Hence, the proposed connection consisting of the CCT and CCA is able to replace
the original skip connection to solve the semantic gaps for an accurate
automatic medical image segmentation. The experimental results suggest that our
UCTransNet produces more precise segmentation performance and achieves
consistent improvements over the state-of-the-art for semantic segmentation
across different datasets and conventional architectures involving transformer
or U-shaped framework. Code: https://github.com/McGregorWwww/UCTransNet.
Related papers
- LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections [27.473573286685063]
We propose a remote-sensing image semantic segmentation network named LKASeg.
LKASeg combines Large Kernel Attention(LSKA) and Full-Scale Skip Connections(FSC)
On the ISPRS Vaihingen dataset, the mF1 and mIoU scores achieved 90.33% and 82.77%.
arXiv Detail & Related papers (2024-10-14T12:25:48Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - SCTransNet: Spatial-channel Cross Transformer Network for Infrared Small Target Detection [46.049401912285134]
Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models.
Existing techniques struggle when the target has high similarities with the background.
We present a Spatial-channel Cross Transformer Network (SCTransNet) that leverages spatial-channel cross transformer blocks.
arXiv Detail & Related papers (2024-01-28T06:41:15Z) - Narrowing the semantic gaps in U-Net with learnable skip connections:
The case of medical image segmentation [12.812992773512871]
We propose a new segmentation framework, named UDTransNet, to solve three semantic gaps in U-Net.
Specifically, we propose a Dual Attention Transformer ( DAT) module for capturing the channel- and spatial-wise relationships, and a Decoder-guided Recalibration Attention (DRA) module for effectively connecting the DAT tokens and the decoder features.
Our UDTransNet produces higher evaluation scores and finer segmentation results with relatively fewer parameters over the state-of-the-art segmentation methods on different public datasets.
arXiv Detail & Related papers (2023-12-23T07:39:42Z) - FusionU-Net: U-Net with Enhanced Skip Connection for Pathology Image
Segmentation [9.70345458475663]
FusionU-Net is based on U-Net structure and incorporates a fusion module to exchange information between different skip connections.
We conducted extensive experiments on multiple pathology image datasets to evaluate our model and found that FusionU-Net achieves better performance compared to other competing methods.
arXiv Detail & Related papers (2023-10-17T02:56:10Z) - SegNetr: Rethinking the local-global interactions and skip connections
in U-shaped networks [1.121518046252855]
U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure.
We introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity.
We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59% and 76% fewer parameters and GFLOPs than vanilla U-Net.
arXiv Detail & Related papers (2023-07-06T12:39:06Z) - Towards Diverse Binary Segmentation via A Simple yet General Gated Network [71.19503376629083]
We propose a simple yet general gated network (GateNet) to tackle binary segmentation tasks.
With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder.
We introduce a "Fold" operation to improve the atrous convolution and form a novel folded atrous convolution.
arXiv Detail & Related papers (2023-03-18T11:26:36Z) - TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D
Salient Object Detection [86.94578023985677]
In this work, we rethink this task from the perspective of global information alignment and transformation.
Specifically, the proposed method (TransCMD) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path.
Experimental results on seven RGB-D SOD benchmark datasets demonstrate that a simple two-stream encoder-decoder framework can surpass the state-of-the-art purely CNN-based methods.
arXiv Detail & Related papers (2021-12-04T15:45:34Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - Interactive Video Object Segmentation Using Global and Local Transfer
Modules [51.93009196085043]
We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net)
Given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture.
We train the entire network in two stages, by emulating user scribbles and employing an auxiliary loss.
arXiv Detail & Related papers (2020-07-16T06:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.