Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
- URL: http://arxiv.org/abs/2108.06932v8
- Date: Mon, 19 Feb 2024 13:02:26 GMT
- Title: Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
- Authors: Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, Ling Shao
- Abstract summary: We propose a new type of polyp segmentation method, named Polyp-PVT.
The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities.
- Score: 124.01928050651466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most polyp segmentation methods use CNNs as their backbone, leading to two
key issues when exchanging information between the encoder and decoder: 1)
taking into account the differences in contribution between different-level
features and 2) designing an effective mechanism for fusing these features.
Unlike existing CNN-based methods, we adopt a transformer encoder, which learns
more powerful and robust representations. In addition, considering the image
acquisition influence and elusive properties of polyps, we introduce three
standard modules, including a cascaded fusion module (CFM), a camouflage
identification module (CIM), and a similarity aggregation module (SAM). Among
these, the CFM is used to collect the semantic and location information of
polyps from high-level features; the CIM is applied to capture polyp
information disguised in low-level features, and the SAM extends the pixel
features of the polyp area with high-level semantic position information to the
entire polyp area, thereby effectively fusing cross-level features. The
proposed model, named Polyp-PVT, effectively suppresses noises in the features
and significantly improves their expressive capabilities. Extensive experiments
on five widely adopted datasets show that the proposed model is more robust to
various challenging situations (e.g., appearance changes, small objects,
rotation) than existing representative methods. The proposed model is available
at https://github.com/DengPingFan/Polyp-PVT.
Related papers
- HiFiSeg: High-Frequency Information Enhanced Polyp Segmentation with Global-Local Vision Transformer [5.96521715927858]
HiFiSeg is a novel network for colon polyp segmentation that enhances high-frequency information processing.
GLIM employs a parallel structure to fuse global and local information at multiple scales, effectively capturing fine-grained features.
SAM selectively integrates boundary details from low-level features with semantic information from high-level features, significantly improving the model's ability to accurately detect and segment polyps.
arXiv Detail & Related papers (2024-10-03T14:36:22Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - FLDNet: A Foreground-Aware Network for Polyp Segmentation Leveraging
Long-Distance Dependencies [1.7623838912231695]
We propose FLDNet, a Transformer-based neural network that captures long-distance dependencies for accurate polyp segmentation.
Our proposed method, FLDNet, was evaluated using seven metrics on common datasets and demonstrated superiority over state-of-the-art methods on widely-used evaluation measures.
arXiv Detail & Related papers (2023-09-12T06:32:42Z) - RaBiT: An Efficient Transformer using Bidirectional Feature Pyramid
Network with Reverse Attention for Colon Polyp Segmentation [0.0]
This paper introduces RaBiT, an encoder-decoder model that incorporates a lightweight Transformer-based architecture in the encoder.
Our method demonstrates high generalization capability in cross-dataset experiments, even when the training and test sets have different characteristics.
arXiv Detail & Related papers (2023-07-12T19:25:10Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - LAPFormer: A Light and Accurate Polyp Segmentation Transformer [6.352264764099531]
We propose a new model with encoder-decoder architecture named LAPFormer, which uses a hierarchical Transformer encoder to better extract global feature.
Our proposed decoder contains a progressive feature fusion module designed for fusing feature from upper scales and lower scales.
We test our model on five popular benchmark datasets for polyp segmentation.
arXiv Detail & Related papers (2022-10-10T01:52:30Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Automatic Polyp Segmentation via Multi-scale Subtraction Network [100.94922587360871]
In clinical practice, precise polyp segmentation provides important information in the early detection of colorectal cancer.
Most existing methods are based on U-shape structure and use element-wise addition or concatenation to fuse different level features progressively in decoder.
We propose a multi-scale subtraction network (MSNet) to segment polyp from colonoscopy image.
arXiv Detail & Related papers (2021-08-11T07:54:07Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.