Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
- URL: http://arxiv.org/abs/2108.06932v8
- Date: Mon, 19 Feb 2024 13:02:26 GMT
- Title: Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
- Authors: Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, Ling Shao
- Abstract summary: We propose a new type of polyp segmentation method, named Polyp-PVT.
The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities.
- Score: 124.01928050651466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most polyp segmentation methods use CNNs as their backbone, leading to two
key issues when exchanging information between the encoder and decoder: 1)
taking into account the differences in contribution between different-level
features and 2) designing an effective mechanism for fusing these features.
Unlike existing CNN-based methods, we adopt a transformer encoder, which learns
more powerful and robust representations. In addition, considering the image
acquisition influence and elusive properties of polyps, we introduce three
standard modules, including a cascaded fusion module (CFM), a camouflage
identification module (CIM), and a similarity aggregation module (SAM). Among
these, the CFM is used to collect the semantic and location information of
polyps from high-level features; the CIM is applied to capture polyp
information disguised in low-level features, and the SAM extends the pixel
features of the polyp area with high-level semantic position information to the
entire polyp area, thereby effectively fusing cross-level features. The
proposed model, named Polyp-PVT, effectively suppresses noises in the features
and significantly improves their expressive capabilities. Extensive experiments
on five widely adopted datasets show that the proposed model is more robust to
various challenging situations (e.g., appearance changes, small objects,
rotation) than existing representative methods. The proposed model is available
at https://github.com/DengPingFan/Polyp-PVT.
Related papers
- Dual-scale Enhanced and Cross-generative Consistency Learning for
Semi-supervised Polyp Segmentation [52.06525450636897]
Automatic polyp segmentation plays a crucial role in the early diagnosis and treatment of colorectal cancer.
Existing methods rely heavily on fully supervised training, which requires a large amount of labeled data with time-consuming pixel-wise annotations.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised polyp (DEC-Seg) from colonoscopy images.
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - M3FPolypSegNet: Segmentation Network with Multi-frequency Feature Fusion
for Polyp Localization in Colonoscopy Images [1.389360509566256]
Multi-Frequency Feature Fusion Polyp Network (M3FPolypSegNet) was proposed to decompose the input image into low/high/full-frequency components.
We used three independent multi-frequency encoders to map multiple input images into a high-dimensional feature space.
We designed three multi-task learning (i.e., region, edge, and distance) in four decoder blocks to learn the structural characteristics of the region.
arXiv Detail & Related papers (2023-10-09T09:01:53Z) - FLDNet: A Foreground-Aware Network for Polyp Segmentation Leveraging
Long-Distance Dependencies [1.7623838912231695]
We propose FLDNet, a Transformer-based neural network that captures long-distance dependencies for accurate polyp segmentation.
Our proposed method, FLDNet, was evaluated using seven metrics on common datasets and demonstrated superiority over state-of-the-art methods on widely-used evaluation measures.
arXiv Detail & Related papers (2023-09-12T06:32:42Z) - RaBiT: An Efficient Transformer using Bidirectional Feature Pyramid
Network with Reverse Attention for Colon Polyp Segmentation [0.0]
This paper introduces RaBiT, an encoder-decoder model that incorporates a lightweight Transformer-based architecture in the encoder.
Our method demonstrates high generalization capability in cross-dataset experiments, even when the training and test sets have different characteristics.
arXiv Detail & Related papers (2023-07-12T19:25:10Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - LAPFormer: A Light and Accurate Polyp Segmentation Transformer [6.352264764099531]
We propose a new model with encoder-decoder architecture named LAPFormer, which uses a hierarchical Transformer encoder to better extract global feature.
Our proposed decoder contains a progressive feature fusion module designed for fusing feature from upper scales and lower scales.
We test our model on five popular benchmark datasets for polyp segmentation.
arXiv Detail & Related papers (2022-10-10T01:52:30Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Automatic Polyp Segmentation via Multi-scale Subtraction Network [100.94922587360871]
In clinical practice, precise polyp segmentation provides important information in the early detection of colorectal cancer.
Most existing methods are based on U-shape structure and use element-wise addition or concatenation to fuse different level features progressively in decoder.
We propose a multi-scale subtraction network (MSNet) to segment polyp from colonoscopy image.
arXiv Detail & Related papers (2021-08-11T07:54:07Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.