ScribFormer: Transformer Makes CNN Work Better for Scribble-based
Medical Image Segmentation
- URL: http://arxiv.org/abs/2402.02029v1
- Date: Sat, 3 Feb 2024 04:55:22 GMT
- Title: ScribFormer: Transformer Makes CNN Work Better for Scribble-based
Medical Image Segmentation
- Authors: Zihan Li, Yuan Zheng, Dandan Shan, Shuzhou Yang, Qingde Li, Beizhan
Wang, Yuanting Zhang, Qingqi Hong, Dinggang Shen
- Abstract summary: This paper proposes a new CNN-Transformer hybrid solution for scribble-supervised medical image segmentation called ScribFormer.
The proposed ScribFormer model has a triple-branch structure, i.e., the hybrid of a CNN branch, a Transformer branch, and an attention-guided class activation map (ACAM) branch.
- Score: 43.24187067938417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most recent scribble-supervised segmentation methods commonly adopt a CNN
framework with an encoder-decoder architecture. Despite its multiple benefits,
this framework generally can only capture small-range feature dependency for
the convolutional layer with the local receptive field, which makes it
difficult to learn global shape information from the limited information
provided by scribble annotations. To address this issue, this paper proposes a
new CNN-Transformer hybrid solution for scribble-supervised medical image
segmentation called ScribFormer. The proposed ScribFormer model has a
triple-branch structure, i.e., the hybrid of a CNN branch, a Transformer
branch, and an attention-guided class activation map (ACAM) branch.
Specifically, the CNN branch collaborates with the Transformer branch to fuse
the local features learned from CNN with the global representations obtained
from Transformer, which can effectively overcome limitations of existing
scribble-supervised segmentation methods. Furthermore, the ACAM branch assists
in unifying the shallow convolution features and the deep convolution features
to improve model's performance further. Extensive experiments on two public
datasets and one private dataset show that our ScribFormer has superior
performance over the state-of-the-art scribble-supervised segmentation methods,
and achieves even better results than the fully-supervised segmentation
methods. The code is released at https://github.com/HUANGLIZI/ScribFormer.
Related papers
- SCTNet: Single-Branch CNN with Transformer Semantic Information for
Real-Time Segmentation [46.068509764538085]
SCTNet is a single branch CNN with transformer semantic information for real-time segmentation.
SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN.
We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance.
arXiv Detail & Related papers (2023-12-28T15:33:16Z) - CATS v2: Hybrid encoders for robust medical segmentation [12.194439938007672]
Convolutional Neural Networks (CNNs) have exhibited strong performance in medical image segmentation tasks.
However, due to the limited field of view of convolution kernel, it is hard for CNNs to fully represent global information.
We propose CATS v2 with hybrid encoders, which better leverage both local and global information.
arXiv Detail & Related papers (2023-08-11T20:21:54Z) - ConvFormer: Combining CNN and Transformer for Medical Image Segmentation [17.88894109620463]
We propose a hierarchical CNN and Transformer hybrid architecture, called ConvFormer, for medical image segmentation.
Our ConvFormer, trained from scratch, outperforms various CNN- or Transformer-based architectures, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-11-15T23:11:22Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - SegTransVAE: Hybrid CNN -- Transformer with Regularization for medical
image segmentation [0.0]
A novel network named SegTransVAE is proposed in this paper.
SegTransVAE is built upon encoder-decoder architecture, exploiting transformer with the variational autoencoder (VAE) branch to the network.
Evaluation on various recently introduced datasets shows that SegTransVAE outperforms previous methods in Dice Score and $95%$-Haudorff Distance.
arXiv Detail & Related papers (2022-01-21T08:02:55Z) - Semi-Supervised Medical Image Segmentation via Cross Teaching between
CNN and Transformer [11.381487613753004]
We present a framework for semi-supervised medical image segmentation by introducing the cross teaching between CNN and Transformer.
Notably, this work may be the first attempt to combine CNN and transformer for semi-supervised medical image segmentation and achieve promising results on a public benchmark.
arXiv Detail & Related papers (2021-12-09T13:22:38Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.