Focal-UNet: UNet-like Focal Modulation for Medical Image Segmentation
- URL: http://arxiv.org/abs/2212.09263v1
- Date: Mon, 19 Dec 2022 06:17:22 GMT
- Title: Focal-UNet: UNet-like Focal Modulation for Medical Image Segmentation
- Authors: MohammadReza Naderi, MohammadHossein Givkashi, Fatemeh Piri, Nader
Karimi, Shadrokh Samavi
- Abstract summary: We propose a new U-shaped architecture for medical image segmentation with the help of the newly introduced focal modulation mechanism.
Due to the ability of the focal module to aggregate local and global features, our model could simultaneously benefit the wide receptive field of transformers.
- Score: 8.75217589103206
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, many attempts have been made to construct a transformer base
U-shaped architecture, and new methods have been proposed that outperformed
CNN-based rivals. However, serious problems such as blockiness and cropped
edges in predicted masks remain because of transformers' patch partitioning
operations. In this work, we propose a new U-shaped architecture for medical
image segmentation with the help of the newly introduced focal modulation
mechanism. The proposed architecture has asymmetric depths for the encoder and
decoder. Due to the ability of the focal module to aggregate local and global
features, our model could simultaneously benefit the wide receptive field of
transformers and local viewing of CNNs. This helps the proposed method balance
the local and global feature usage to outperform one of the most powerful
transformer-based U-shaped models called Swin-UNet. We achieved a 1.68% higher
DICE score and a 0.89 better HD metric on the Synapse dataset. Also, with
extremely limited data, we had a 4.25% higher DICE score on the NeoPolyp
dataset. Our implementations are available at:
https://github.com/givkashi/Focal-UNet
Related papers
- HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation [11.334990474402915]
We introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers.
HAFormer achieves high performance with minimal computational overhead and compact model size.
arXiv Detail & Related papers (2024-07-10T07:53:24Z) - SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation [53.675725490807615]
We introduce SDPose, a new self-distillation method for improving the performance of small transformer-based models.
SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs, while SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset.
arXiv Detail & Related papers (2024-04-04T15:23:14Z) - CompletionFormer: Depth Completion with Convolutions and Vision
Transformers [0.0]
This paper proposes a Joint Convolutional Attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure.
Our CompletionFormer outperforms state-of-the-art CNNs-based methods on the outdoor KITTI Depth Completion benchmark and indoor NYUv2 dataset, achieving significantly higher efficiency (nearly 1/3 FLOPs) compared to pure Transformer-based methods.
arXiv Detail & Related papers (2023-04-25T17:59:47Z) - MECPformer: Multi-estimations Complementary Patch with CNN-Transformers
for Weakly Supervised Semantic Segmentation [8.975330500836057]
We propose a simple yet effective method with Multi-estimations Complementary Patch (MECP) strategy and Adaptive Conflict Module (ACM)
In addition, ACM adaptively removes conflicting pixels and exploits the network self-training capability to mine potential target information.
Our MECPformer has reached new state-of-the-art 72.0% mIoU on the PASCAL VOC 2012 and 42.4% on MS COCO 2014 dataset.
arXiv Detail & Related papers (2023-03-19T15:42:45Z) - Magic ELF: Image Deraining Meets Association Learning and Transformer [63.761812092934576]
This paper aims to unify CNN and Transformer to take advantage of their learning merits for image deraining.
A novel multi-input attention module (MAM) is proposed to associate rain removal and background recovery.
Our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average.
arXiv Detail & Related papers (2022-07-21T12:50:54Z) - HiFormer: Hierarchical Multi-scale Representations Using Transformers
for Medical Image Segmentation [3.478921293603811]
HiFormer is a novel method that efficiently bridges a CNN and a transformer for medical image segmentation.
To secure a fine fusion of global and local features, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure.
arXiv Detail & Related papers (2022-07-18T11:30:06Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - SideRT: A Real-time Pure Transformer Architecture for Single Image Depth
Estimation [11.513054537848227]
We propose a pure transformer architecture called SideRT that can attain excellent predictions in real-time.
This is the first work to show that transformer-based networks can attain state-of-the-art performance in real-time in the single image depth estimation field.
arXiv Detail & Related papers (2022-04-29T05:46:20Z) - Adaptive Split-Fusion Transformer [90.04885335911729]
We propose an Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights.
Experiments on standard benchmarks, such as ImageNet-1K, show that our ASF-former outperforms its CNN, transformer counterparts, and hybrid pilots in terms of accuracy.
arXiv Detail & Related papers (2022-04-26T10:00:28Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.