Medical Image Segmentation via Sparse Coding Decoder
- URL: http://arxiv.org/abs/2310.10957v1
- Date: Tue, 17 Oct 2023 03:08:35 GMT
- Title: Medical Image Segmentation via Sparse Coding Decoder
- Authors: Long Zeng, Kaigui Wu
- Abstract summary: Transformers have achieved significant success in medical image segmentation, owing to its capability to capture long-range dependencies.
Previous works incorporate convolutional layers into the encoder module of transformers, thereby enhancing their ability to learn local relationships among pixels.
However, transformers may suffer from limited generalization capabilities and reduced robustness, attributed to the insufficient spatial recovery ability of their decoders.
- Score: 3.9633192172709975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have achieved significant success in medical image segmentation,
owing to its capability to capture long-range dependencies. Previous works
incorporate convolutional layers into the encoder module of transformers,
thereby enhancing their ability to learn local relationships among pixels.
However, transformers may suffer from limited generalization capabilities and
reduced robustness, attributed to the insufficient spatial recovery ability of
their decoders. To address this issue, A convolution sparse vector coding based
decoder is proposed , namely CAScaded multi-layer Convolutional Sparse vector
Coding DEcoder (CASCSCDE), which represents features extracted by the encoder
using sparse vectors. To prove the effectiveness of our CASCSCDE, The
widely-used TransUNet model is chosen for the demonstration purpose, and the
CASCSCDE is incorporated with TransUNet to establish the TransCASCSCDE
architecture. Our experiments demonstrate that TransUNet with CASCSCDE
significantly enhances performance on the Synapse benchmark, obtaining up to
3.15\% and 1.16\% improvements in DICE and mIoU scores, respectively. CASCSCDE
opens new ways for constructing decoders based on convolutional sparse vector
coding.
Related papers
- Optimizing Medical Image Segmentation with Advanced Decoder Design [0.8402155549849591]
U-Net is widely used in medical image segmentation due to its simple and flexible architecture design.
We propose Swin DER (i.e., Swin UNETR Decoder Enhanced and Refined) by specifically optimizing the design of these three components.
Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse and the MSD brain tumor segmentation task.
arXiv Detail & Related papers (2024-10-05T11:47:13Z) - CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection.
We propose a novel decoder block that integrates feature pyramids and transformers.
Our model achieves superior performance in detecting small objects compared to existing methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z) - MUSTER: A Multi-scale Transformer-based Decoder for Semantic Segmentation [19.83103856355554]
MUSTER is a transformer-based decoder that seamlessly integrates with hierarchical encoders.
MSKA units enable the fusion of multi-scale features from the encoder and decoder, facilitating comprehensive information integration.
On the challenging ADE20K dataset, our best model achieves a single-scale mIoU of 50.23 and a multi-scale mIoU of 51.88.
arXiv Detail & Related papers (2022-11-25T06:51:07Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Transformer Meets DCFAM: A Novel Semantic Segmentation Scheme for
Fine-Resolution Remote Sensing Images [6.171417925832851]
We introduce the Swin Transformer as the backbone to fully extract the context information.
We also design a novel decoder named densely connected feature aggregation module (DCFAM) to restore the resolution and generate the segmentation map.
arXiv Detail & Related papers (2021-04-25T11:34:22Z) - UNETR: Transformers for 3D Medical Image Segmentation [8.59571749685388]
We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume.
We have extensively validated the performance of our proposed model across different imaging modalities.
arXiv Detail & Related papers (2021-03-18T20:17:15Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.