Cooperation Learning Enhanced Colonic Polyp Segmentation Based on
Transformer-CNN Fusion
- URL: http://arxiv.org/abs/2301.06892v1
- Date: Tue, 17 Jan 2023 13:58:17 GMT
- Title: Cooperation Learning Enhanced Colonic Polyp Segmentation Based on
Transformer-CNN Fusion
- Authors: Yuanyuan Wang, Zhaohong Deng, Qiongdan Lou, Shudong Hu, Kup-sze Choi,
Shitong Wang
- Abstract summary: We propose a hybrid network called Fusion-Transformer-HardNetMSEG (i.e., Fu-TransHNet) in this study.
Fu-TransHNet uses deep learning of different mechanisms to fuse each other and is enhanced with multi-view collaborative learning techniques.
Experimental results showed that the Fu-TransHNet network was superior to the existing methods on five widely used benchmark datasets.
- Score: 21.6402447417878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional segmentation methods for colonic polyps are mainly designed based
on low-level features. They could not accurately extract the location of small
colonic polyps. Although the existing deep learning methods can improve the
segmentation accuracy, their effects are still unsatisfied. To meet the above
challenges, we propose a hybrid network called Fusion-Transformer-HardNetMSEG
(i.e., Fu-TransHNet) in this study. Fu-TransHNet uses deep learning of
different mechanisms to fuse each other and is enhanced with multi-view
collaborative learning techniques. Firstly, the Fu-TransHNet utilizes the
Transformer branch and the CNN branch to realize the global feature learning
and local feature learning, respectively. Secondly, a fusion module is designed
to integrate the features from two branches. The fusion module consists of two
parts: 1) the Global-Local Feature Fusion (GLFF) part and 2) the Dense Fusion
of Multi-scale features (DFM) part. The former is built to compensate the
feature information mission from two branches at the same scale; the latter is
constructed to enhance the feature representation. Thirdly, the above two
branches and fusion modules utilize multi-view cooperative learning techniques
to obtain their respective weights that denote their importance and then make a
final decision comprehensively. Experimental results showed that the
Fu-TransHNet network was superior to the existing methods on five widely used
benchmark datasets. In particular, on the ETIS-LaribPolypDB dataset containing
many small-target colonic polyps, the mDice obtained by Fu-TransHNet were 12.4%
and 6.2% higher than the state-of-the-art methods HardNet-MSEG and TransFuse-s,
respectively.
Related papers
- CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion [0.0]
We present a novel lightweight hybrid network that pairs Convolution with Transformers.
We fuse the local responses acquired from the convolution path with the global responses acquired from the MFCA module.
Experiments demonstrate that our variants achieve state-of-the-art performance, whether trained from scratch on large data or even with low-data regime.
arXiv Detail & Related papers (2024-07-09T08:47:13Z) - Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - FusionMamba: Efficient Remote Sensing Image Fusion with State Space Model [35.57157248152558]
Current deep learning (DL) methods typically employ convolutional neural networks (CNNs) or Transformers for feature extraction and information integration.
We propose FusionMamba, an innovative method for efficient remote sensing image fusion.
arXiv Detail & Related papers (2024-04-11T17:29:56Z) - Towards Cooperative Federated Learning over Heterogeneous Edge/Fog
Networks [49.19502459827366]
Federated learning (FL) has been promoted as a popular technique for training machine learning (ML) models over edge/fog networks.
Traditional implementations of FL have largely neglected the potential for inter-network cooperation.
We advocate for cooperative federated learning (CFL), a cooperative edge/fog ML paradigm built on device-to-device (D2D) and device-to-server (D2S) interactions.
arXiv Detail & Related papers (2023-03-15T04:41:36Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape
Recognition [38.540048855119004]
We propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.
The core component of LATFormer is a module named Locality-Aware Fusion (LAF) which integrates the local features of correlated regions across the two modalities.
In our LATFormer, we utilize the LAF module to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically to obtain more informative features.
arXiv Detail & Related papers (2021-09-03T03:23:27Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Efficient Human Pose Estimation by Learning Deeply Aggregated
Representations [67.24496300046255]
We propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations.
Our networks could achieve comparable or even better accuracy with much smaller model complexity.
arXiv Detail & Related papers (2020-12-13T10:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.