Cats: Complementary CNN and Transformer Encoders for Segmentation
- URL: http://arxiv.org/abs/2208.11572v1
- Date: Wed, 24 Aug 2022 14:25:11 GMT
- Title: Cats: Complementary CNN and Transformer Encoders for Segmentation
- Authors: Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, Ipek Oguz
- Abstract summary: We propose a model with double encoders for 3D biomedical image segmentation.
We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results.
Compared to the state-of-the-art models with and without transformers on each task, our proposed method obtains higher Dice scores across the board.
- Score: 13.288195115791758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, deep learning methods have achieved state-of-the-art performance in
many medical image segmentation tasks. Many of these are based on convolutional
neural networks (CNNs). For such methods, the encoder is the key part for
global and local information extraction from input images; the extracted
features are then passed to the decoder for predicting the segmentations. In
contrast, several recent works show a superior performance with the use of
transformers, which can better model long-range spatial dependencies and
capture low-level details. However, transformer as sole encoder underperforms
for some tasks where it cannot efficiently replace the convolution based
encoder. In this paper, we propose a model with double encoders for 3D
biomedical image segmentation. Our model is a U-shaped CNN augmented with an
independent transformer encoder. We fuse the information from the convolutional
encoder and the transformer, and pass it to the decoder to obtain the results.
We evaluate our methods on three public datasets from three different
challenges: BTCV, MoDA and Decathlon. Compared to the state-of-the-art models
with and without transformers on each task, our proposed method obtains higher
Dice scores across the board.
Related papers
- Efficient Transformer Encoders for Mask2Former-style models [57.54752243522298]
ECO-M2F is a strategy to self-select the number of hidden layers in the encoder conditioned on the input image.
The proposed approach reduces expected encoder computational cost while maintaining performance.
It is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
arXiv Detail & Related papers (2024-04-23T17:26:34Z) - ParaTransCNN: Parallelized TransCNN Encoder for Medical Image
Segmentation [7.955518153976858]
We propose an advanced 2D feature extraction method by combining the convolutional neural network and Transformer architectures.
Our method is shown with better segmentation accuracy, especially on small organs.
arXiv Detail & Related papers (2024-01-27T05:58:36Z) - CATS v2: Hybrid encoders for robust medical segmentation [12.194439938007672]
Convolutional Neural Networks (CNNs) have exhibited strong performance in medical image segmentation tasks.
However, due to the limited field of view of convolution kernel, it is hard for CNNs to fully represent global information.
We propose CATS v2 with hybrid encoders, which better leverage both local and global information.
arXiv Detail & Related papers (2023-08-11T20:21:54Z) - ConvTransSeg: A Multi-resolution Convolution-Transformer Network for
Medical Image Segmentation [14.485482467748113]
We propose a hybrid encoder-decoder segmentation model (ConvTransSeg)
It consists of a multi-layer CNN as the encoder for feature learning and the corresponding multi-level Transformer as the decoder for segmentation prediction.
Our method achieves the best performance in terms of Dice coefficient and average symmetric surface distance measures with low model complexity and memory consumption.
arXiv Detail & Related papers (2022-10-13T14:59:23Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - ViTBIS: Vision Transformer for Biomedical Image Segmentation [0.0]
We propose a novel network named Vision Transformer for Biomedical Image (ViTBIS)
Our network splits the input feature maps into three parts with $1times 1$, $3times 3$ and $5times 5$ convolutions in both encoder and decoder.
arXiv Detail & Related papers (2022-01-15T20:44:45Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - UNETR: Transformers for 3D Medical Image Segmentation [8.59571749685388]
We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume.
We have extensively validated the performance of our proposed model across different imaging modalities.
arXiv Detail & Related papers (2021-03-18T20:17:15Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.