Related papers: MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation

MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation

URL: http://arxiv.org/abs/2303.10689v1
Date: Sun, 19 Mar 2023 15:42:45 GMT
Title: MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation
Authors: Chunmeng Liu, Guangyao Li, Yao Shen, Ruiqi Wang
Abstract summary: We propose a simple yet effective method with Multi-estimations Complementary Patch (MECP) strategy and Adaptive Conflict Module (ACM) In addition, ACM adaptively removes conflicting pixels and exploits the network self-training capability to mine potential target information. Our MECPformer has reached new state-of-the-art 72.0% mIoU on the PASCAL VOC 2012 and 42.4% on MS COCO 2014 dataset.
Score: 8.975330500836057
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The initial seed based on the convolutional neural network (CNN) for weakly supervised semantic segmentation always highlights the most discriminative regions but fails to identify the global target information. Methods based on transformers have been proposed successively benefiting from the advantage of capturing long-range feature representations. However, we observe a flaw regardless of the gifts based on the transformer. Given a class, the initial seeds generated based on the transformer may invade regions belonging to other classes. Inspired by the mentioned issues, we devise a simple yet effective method with Multi-estimations Complementary Patch (MECP) strategy and Adaptive Conflict Module (ACM), dubbed MECPformer. Given an image, we manipulate it with the MECP strategy at different epochs, and the network mines and deeply fuses the semantic information at different levels. In addition, ACM adaptively removes conflicting pixels and exploits the network self-training capability to mine potential target information. Without bells and whistles, our MECPformer has reached new state-of-the-art 72.0% mIoU on the PASCAL VOC 2012 and 42.4% on MS COCO 2014 dataset. The code is available at https://github.com/ChunmengLiu1/MECPformer.

Related papers

Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation [10.727162449071155]
We build CNN-style Transformers (ConvFormer) to promote better attention convergence and thus better segmentation performance. In contrast to positional embedding and tokenization, ConvFormer adopts 2D convolution and max-pooling for both position information preservation and feature size reduction.
arXiv Detail & Related papers (2023-09-09T02:18:17Z)
SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection [12.727650696327878]
We propose an end-to-end compounded dense network SwinV2DNet to inherit advantages of transformer and CNN. It captures the change relationship features through the densely connected Swin V2 backbone. It provides the low-level pre-changed and post-changed features through a CNN branch.
arXiv Detail & Related papers (2023-08-22T03:31:52Z)
Focal-UNet: UNet-like Focal Modulation for Medical Image Segmentation [8.75217589103206]
We propose a new U-shaped architecture for medical image segmentation with the help of the newly introduced focal modulation mechanism. Due to the ability of the focal module to aggregate local and global features, our model could simultaneously benefit the wide receptive field of transformers.
arXiv Detail & Related papers (2022-12-19T06:17:22Z)
Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation [0.0]
This work proposes a new WSSS method dubbed ViT-PCM (ViT Patch-Class Mapping), not based on CAM. Our model outperforms the state-of-the-art on baseline pseudo-masks (BPM), where we achieve $69.3%$ mIoU on PascalVOC 2012 $val$ set.
arXiv Detail & Related papers (2022-10-31T15:32:23Z)
Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks. Transformers that need to incorporate contextual information to extract features dynamically are neglected. We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z)
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation. It simultaneously learns global semantic information and local spatial-detailed features. Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z)
TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic Segmentation [19.333543299407832]
We propose TransCAM, a Conformer-based solution to weakly supervised semantic segmentation. We show that TransCAM achieves a new state-of-the-art performance of 69.3% and 69.6% on the respective PASCAL VOC 2012 validation and test sets.
arXiv Detail & Related papers (2022-03-14T16:17:18Z)
HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance. Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z)
Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation. tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.