LAPFormer: A Light and Accurate Polyp Segmentation Transformer
- URL: http://arxiv.org/abs/2210.04393v1
- Date: Mon, 10 Oct 2022 01:52:30 GMT
- Title: LAPFormer: A Light and Accurate Polyp Segmentation Transformer
- Authors: Mai Nguyen, Tung Thanh Bui, Quan Van Nguyen, Thanh Tung Nguyen, Toan
Van Pham
- Abstract summary: We propose a new model with encoder-decoder architecture named LAPFormer, which uses a hierarchical Transformer encoder to better extract global feature.
Our proposed decoder contains a progressive feature fusion module designed for fusing feature from upper scales and lower scales.
We test our model on five popular benchmark datasets for polyp segmentation.
- Score: 6.352264764099531
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Polyp segmentation is still known as a difficult problem due to the large
variety of polyp shapes, scanning and labeling modalities. This prevents deep
learning model to generalize well on unseen data. However, Transformer-based
approach recently has achieved some remarkable results on performance with the
ability of extracting global context better than CNN-based architecture and yet
lead to better generalization. To leverage this strength of Transformer, we
propose a new model with encoder-decoder architecture named LAPFormer, which
uses a hierarchical Transformer encoder to better extract global feature and
combine with our novel CNN (Convolutional Neural Network) decoder for capturing
local appearance of the polyps. Our proposed decoder contains a progressive
feature fusion module designed for fusing feature from upper scales and lower
scales and enable multi-scale features to be more correlative. Besides, we also
use feature refinement module and feature selection module for processing
feature. We test our model on five popular benchmark datasets for polyp
segmentation, including Kvasir, CVC-Clinic DB, CVC-ColonDB, CVC-T, and
ETIS-Larib
Related papers
- CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection.
We propose a novel decoder block that integrates feature pyramids and transformers.
Our model achieves superior performance in detecting small objects compared to existing methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z) - Multi-Layer Dense Attention Decoder for Polyp Segmentation [10.141956829529859]
We propose a novel decoder architecture aimed at hierarchically aggregating locally enhanced multi-level dense features.
Specifically, we introduce a novel module named Dense Attention Gate (DAG), which adaptively fuses all previous layers' features to establish local feature relations among all layers.
Our experiments and comparisons with nine competing segmentation models demonstrate that the proposed architecture achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-03-27T01:15:05Z) - RaBiT: An Efficient Transformer using Bidirectional Feature Pyramid
Network with Reverse Attention for Colon Polyp Segmentation [0.0]
This paper introduces RaBiT, an encoder-decoder model that incorporates a lightweight Transformer-based architecture in the encoder.
Our method demonstrates high generalization capability in cross-dataset experiments, even when the training and test sets have different characteristics.
arXiv Detail & Related papers (2023-07-12T19:25:10Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - ColonFormer: An Efficient Transformer based Method for Colon Polyp
Segmentation [1.181206257787103]
ColonFormer is an encoder-decoder architecture with the capability of modeling long-range semantic information.
Our ColonFormer achieve state-of-the-art performance on all benchmark datasets.
arXiv Detail & Related papers (2022-05-17T16:34:04Z) - Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers [124.01928050651466]
We propose a new type of polyp segmentation method, named Polyp-PVT.
The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities.
arXiv Detail & Related papers (2021-08-16T07:09:06Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z) - PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale
Convolutional Layer [76.44375136492827]
Convolutional Neural Networks (CNNs) are often scale-sensitive.
We bridge this regret by exploiting multi-scale features in a finer granularity.
The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates.
arXiv Detail & Related papers (2020-07-13T05:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.