Narrowing the semantic gaps in U-Net with learnable skip connections:
The case of medical image segmentation
- URL: http://arxiv.org/abs/2312.15182v1
- Date: Sat, 23 Dec 2023 07:39:42 GMT
- Title: Narrowing the semantic gaps in U-Net with learnable skip connections:
The case of medical image segmentation
- Authors: Haonan Wang, Peng Cao, Xiaoli Liu, Jinzhu Yang, Osmar Zaiane
- Abstract summary: We propose a new segmentation framework, named UDTransNet, to solve three semantic gaps in U-Net.
Specifically, we propose a Dual Attention Transformer ( DAT) module for capturing the channel- and spatial-wise relationships, and a Decoder-guided Recalibration Attention (DRA) module for effectively connecting the DAT tokens and the decoder features.
Our UDTransNet produces higher evaluation scores and finer segmentation results with relatively fewer parameters over the state-of-the-art segmentation methods on different public datasets.
- Score: 12.812992773512871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most state-of-the-art methods for medical image segmentation adopt the
encoder-decoder architecture. However, this U-shaped framework still has
limitations in capturing the non-local multi-scale information with a simple
skip connection. To solve the problem, we firstly explore the potential
weakness of skip connections in U-Net on multiple segmentation tasks, and find
that i) not all skip connections are useful, each skip connection has different
contribution; ii) the optimal combinations of skip connections are different,
relying on the specific datasets. Based on our findings, we propose a new
segmentation framework, named UDTransNet, to solve three semantic gaps in
U-Net. Specifically, we propose a Dual Attention Transformer (DAT) module for
capturing the channel- and spatial-wise relationships to better fuse the
encoder features, and a Decoder-guided Recalibration Attention (DRA) module for
effectively connecting the DAT tokens and the decoder features to eliminate the
inconsistency. Hence, both modules establish a learnable connection to solve
the semantic gaps between the encoder and the decoder, which leads to a
high-performance segmentation model for medical images. Comprehensive
experimental results indicate that our UDTransNet produces higher evaluation
scores and finer segmentation results with relatively fewer parameters over the
state-of-the-art segmentation methods on different public datasets. Code:
https://github.com/McGregorWwww/UDTransNet.
Related papers
- DDU-Net: A Domain Decomposition-based CNN for High-Resolution Image Segmentation on Multiple GPUs [46.873264197900916]
A domain decomposition-based U-Net architecture is introduced, which partitions input images into non-overlapping patches.
A communication network is added to facilitate inter-patch information exchange to enhance the understanding of spatial context.
Results show that the approach achieves a $2-3,%$ higher intersection over union (IoU) score compared to the same network without inter-patch communication.
arXiv Detail & Related papers (2024-07-31T01:07:21Z) - FusionU-Net: U-Net with Enhanced Skip Connection for Pathology Image
Segmentation [9.70345458475663]
FusionU-Net is based on U-Net structure and incorporates a fusion module to exchange information between different skip connections.
We conducted extensive experiments on multiple pathology image datasets to evaluate our model and found that FusionU-Net achieves better performance compared to other competing methods.
arXiv Detail & Related papers (2023-10-17T02:56:10Z) - SegNetr: Rethinking the local-global interactions and skip connections
in U-shaped networks [1.121518046252855]
U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure.
We introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity.
We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59% and 76% fewer parameters and GFLOPs than vanilla U-Net.
arXiv Detail & Related papers (2023-07-06T12:39:06Z) - Towards Diverse Binary Segmentation via A Simple yet General Gated Network [71.19503376629083]
We propose a simple yet general gated network (GateNet) to tackle binary segmentation tasks.
With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder.
We introduce a "Fold" operation to improve the atrous convolution and form a novel folded atrous convolution.
arXiv Detail & Related papers (2023-03-18T11:26:36Z) - DSNet: a simple yet efficient network with dual-stream attention for
lesion segmentation [0.0]
We propose a simple yet efficient network DSNet for lesion segmentation.
Our method achieves SOTA performance in terms of mean Dice coefficient (mDice) and mean Intersection over Union (mIoU) with low model complexity and memory consumption.
arXiv Detail & Related papers (2022-11-30T12:48:17Z) - UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise
Perspective with Transformer [12.680709604300038]
We propose a new segmentation framework, named UCTransNet, from the channel perspective with attention mechanism.
The proposed connection consisting of the CCT and CCA is able to replace the original skip connection to solve the semantic gaps for an accurate medical image segmentation.
arXiv Detail & Related papers (2021-09-09T15:18:20Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - Encoder Fusion Network with Co-Attention Embedding for Referring Image
Segmentation [87.01669173673288]
We propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network.
A co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features.
The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-05-05T02:27:25Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.