BEFUnet: A Hybrid CNN-Transformer Architecture for Precise Medical Image
Segmentation
- URL: http://arxiv.org/abs/2402.08793v1
- Date: Tue, 13 Feb 2024 21:03:36 GMT
- Title: BEFUnet: A Hybrid CNN-Transformer Architecture for Precise Medical Image
Segmentation
- Authors: Omid Nejati Manzari, Javad Mirzapour Kaleybar, Hooman Saadat, Shahin
Maleki
- Abstract summary: This paper proposes an innovative U-shaped network called BEFUnet, which enhances the fusion of body and edge information for precise medical image segmentation.
The BEFUnet comprises three main modules, including a novel Local Cross-Attention Feature (LCAF) fusion module, a novel Double-Level Fusion (DLF) module, and dual-branch encoder.
The LCAF module efficiently fuses edge and body features by selectively performing local cross-attention on features that are spatially close between the two modalities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The accurate segmentation of medical images is critical for various
healthcare applications. Convolutional neural networks (CNNs), especially Fully
Convolutional Networks (FCNs) like U-Net, have shown remarkable success in
medical image segmentation tasks. However, they have limitations in capturing
global context and long-range relations, especially for objects with
significant variations in shape, scale, and texture. While transformers have
achieved state-of-the-art results in natural language processing and image
recognition, they face challenges in medical image segmentation due to image
locality and translational invariance issues. To address these challenges, this
paper proposes an innovative U-shaped network called BEFUnet, which enhances
the fusion of body and edge information for precise medical image segmentation.
The BEFUnet comprises three main modules, including a novel Local
Cross-Attention Feature (LCAF) fusion module, a novel Double-Level Fusion (DLF)
module, and dual-branch encoder. The dual-branch encoder consists of an edge
encoder and a body encoder. The edge encoder employs PDC blocks for effective
edge information extraction, while the body encoder uses the Swin Transformer
to capture semantic information with global attention. The LCAF module
efficiently fuses edge and body features by selectively performing local
cross-attention on features that are spatially close between the two
modalities. This local approach significantly reduces computational complexity
compared to global cross-attention while ensuring accurate feature matching.
BEFUnet demonstrates superior performance over existing methods across various
evaluation metrics on medical image segmentation datasets.
Related papers
- TransResNet: Integrating the Strengths of ViTs and CNNs for High Resolution Medical Image Segmentation via Feature Grafting [6.987177704136503]
High-resolution images are preferable in medical imaging domain as they significantly improve the diagnostic capability of the underlying method.
Most of the existing deep learning-based techniques for medical image segmentation are optimized for input images having small spatial dimensions and perform poorly on high-resolution images.
We propose a parallel-in-branch architecture called TransResNet, which incorporates Transformer and CNN in a parallel manner to extract features from multi-resolution images independently.
arXiv Detail & Related papers (2024-10-01T18:22:34Z) - ASSNet: Adaptive Semantic Segmentation Network for Microtumors and Multi-Organ Segmentation [32.74195208408193]
Medical image segmentation is a crucial task in computer vision, supporting clinicians in diagnosis, treatment planning, and disease monitoring.
We propose the Adaptive Semantic Network (ASSNet), a transformer architecture that effectively integrates local and global features for precise medical image segmentation.
Tests on diverse medical image segmentation tasks, including multi-organ, liver tumor, and bladder tumor segmentation, demonstrate that ASSNet achieves state-of-the-art results.
arXiv Detail & Related papers (2024-09-12T06:25:44Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical
Image Segmentation [7.720152925974362]
We propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA)
The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron.
We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices.
arXiv Detail & Related papers (2023-07-27T02:18:12Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Boundary-aware Context Neural Network for Medical Image Segmentation [15.585851505721433]
Medical image segmentation can provide reliable basis for further clinical analysis and disease diagnosis.
Most existing CNNs-based methods produce unsatisfactory segmentation mask without accurate object boundaries.
In this paper, we formulate a boundary-aware context neural network (BA-Net) for 2D medical image segmentation.
arXiv Detail & Related papers (2020-05-03T02:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.