SPG-CDENet: Spatial Prior-Guided Cross Dual Encoder Network for Multi-Organ Segmentation
- URL: http://arxiv.org/abs/2510.26390v1
- Date: Thu, 30 Oct 2025 11:33:29 GMT
- Title: SPG-CDENet: Spatial Prior-Guided Cross Dual Encoder Network for Multi-Organ Segmentation
- Authors: Xizhi Tian, Changjun Zhou, Yulin. Yang,
- Abstract summary: We propose a novel two-stage segmentation paradigm designed to improve multi-organ segmentation accuracy.<n>Our SPG-CDENet consists of two key components: a spatial prior network and a cross dual encoder network.<n>The global encoder captures global semantic features from the entire image, while the local encoder focuses on features from the prior network.
- Score: 5.970991208589063
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-organ segmentation is a critical task in computer-aided diagnosis. While recent deep learning methods have achieved remarkable success in image segmentation, huge variations in organ size and shape challenge their effectiveness in multi-organ segmentation. To address these challenges, we propose a Spatial Prior-Guided Cross Dual Encoder Network (SPG-CDENet), a novel two-stage segmentation paradigm designed to improve multi-organ segmentation accuracy. Our SPG-CDENet consists of two key components: a spatial prior network and a cross dual encoder network. The prior network generates coarse localization maps that delineate the approximate ROI, serving as spatial guidance for the dual encoder network. The cross dual encoder network comprises four essential components: a global encoder, a local encoder, a symmetric cross-attention module, and a flow-based decoder. The global encoder captures global semantic features from the entire image, while the local encoder focuses on features from the prior network. To enhance the interaction between the global and local encoders, a symmetric cross-attention module is proposed across all layers of the encoders to fuse and refine features. Furthermore, the flow-based decoder directly propagates high-level semantic features from the final encoder layer to all decoder layers, maximizing feature preservation and utilization. Extensive qualitative and quantitative experiments on two public datasets demonstrate the superior performance of SPG-CDENet compared to existing segmentation methods. Furthermore, ablation studies further validate the effectiveness of the proposed modules in improving segmentation accuracy.
Related papers
- Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition [82.88856416080331]
Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications.<n>Current approaches usually treat OLHTR as a sequence recognition task, employing either a single trajectory or image encoder, or multi-stream encoders.<n>We propose a Collaborative learning-based OLHTR framework, called Col-OLHTR, that learns multimodal features during training while maintaining a single-stream inference process.
arXiv Detail & Related papers (2025-02-10T02:12:24Z) - MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation [6.4987174473651725]
We propose a textbftextitacMDNet to handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships.
textitacMDNet is an encoder-decoder network that uses the pre-trained textitMiT-B2 as the encoder and multiple different decoder networks.
textitacMDNet is more interpretable and robust compared to the other baseline models.
arXiv Detail & Related papers (2024-05-10T01:03:03Z) - Towards Diverse Binary Segmentation via A Simple yet General Gated Network [71.19503376629083]
We propose a simple yet general gated network (GateNet) to tackle binary segmentation tasks.
With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder.
We introduce a "Fold" operation to improve the atrous convolution and form a novel folded atrous convolution.
arXiv Detail & Related papers (2023-03-18T11:26:36Z) - Attention guided global enhancement and local refinement network for
semantic segmentation [5.881350024099048]
A lightweight semantic segmentation network is developed using the encoder-decoder architecture.
A Global Enhancement Method is proposed to aggregate global information from high-level feature maps.
A Local Refinement Module is developed by utilizing the decoder features as the semantic guidance.
The two methods are integrated into a Context Fusion Block, and based on that, a novel Attention guided Global enhancement and Local refinement Network (AGLN) is elaborately designed.
arXiv Detail & Related papers (2022-04-09T02:32:24Z) - Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing
Vertical and Horizontal Convolutions [58.71117402626524]
We present a novel double-branch encoder architecture for medical image segmentation.
Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels.
The experiments validate the effectiveness of our model on four datasets.
arXiv Detail & Related papers (2021-07-24T02:58:32Z) - A Holistically-Guided Decoder for Deep Representation Learning with
Applications to Semantic Segmentation and Object Detection [74.88284082187462]
One common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps.
We propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps.
arXiv Detail & Related papers (2020-12-18T10:51:49Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z) - Dual Convolutional LSTM Network for Referring Image Segmentation [18.181286443737417]
referring image segmentation is a problem at the intersection of computer vision and natural language understanding.
We propose a dual convolutional LSTM (ConvLSTM) network to tackle this problem.
arXiv Detail & Related papers (2020-01-30T20:40:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.