MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic
Segmentation
- URL: http://arxiv.org/abs/2209.04471v1
- Date: Fri, 9 Sep 2022 18:03:52 GMT
- Title: MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic
Segmentation
- Authors: Zhenchao Jin, Dongdong Yu, Zehuan Yuan, Lequan Yu
- Abstract summary: We propose a novel soft mining contextual information beyond image paradigm named MCIBI++.
We generate a class probability distribution for each pixel representation and conduct the dataset-level context aggregation.
In the inference phase, we additionally design a coarse-to-fine iterative inference strategy to further boost the segmentation results.
- Score: 29.458735435545048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Co-occurrent visual pattern makes context aggregation become an essential
paradigm for semantic segmentation.The existing studies focus on modeling the
contexts within image while neglecting the valuable semantics of the
corresponding category beyond image. To this end, we propose a novel soft
mining contextual information beyond image paradigm named MCIBI++ to further
boost the pixel-level representations. Specifically, we first set up a
dynamically updated memory module to store the dataset-level distribution
information of various categories and then leverage the information to yield
the dataset-level category representations during network forward. After that,
we generate a class probability distribution for each pixel representation and
conduct the dataset-level context aggregation with the class probability
distribution as weights. Finally, the original pixel representations are
augmented with the aggregated dataset-level and the conventional image-level
contextual information. Moreover, in the inference phase, we additionally
design a coarse-to-fine iterative inference strategy to further boost the
segmentation results. MCIBI++ can be effortlessly incorporated into the
existing segmentation frameworks and bring consistent performance improvements.
Also, MCIBI++ can be extended into the video semantic segmentation framework
with considerable improvements over the baseline. Equipped with MCIBI++, we
achieved the state-of-the-art performance on seven challenging image or video
semantic segmentation benchmarks.
Related papers
- A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic
Segmentation [40.27705176115985]
Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.
We propose a novel meta-learning framework, which predicts pseudo pixel-level segmentation masks from a limited amount of data and their semantic labels.
Our proposed learning model can be viewed as a pixel-level meta-learner.
arXiv Detail & Related papers (2021-11-02T08:28:11Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - InfoSeg: Unsupervised Semantic Image Segmentation with Mutual
Information Maximization [0.0]
We propose a novel method for unsupervised image representation based on mutual information between local and global high-level image features.
In the first step, we segment images based on local and global features.
In the second step, we maximize the mutual information between local features and high-level features of their respective class.
arXiv Detail & Related papers (2021-10-07T14:01:42Z) - ISNet: Integrate Image-Level and Semantic-Level Context for Semantic
Segmentation [64.56511597220837]
Co-occurrent visual pattern makes aggregating contextual information a common paradigm to enhance the pixel representation for semantic image segmentation.
Existing approaches focus on modeling the context from the perspective of the whole image, i.e., aggregating the image-level contextual information.
This paper proposes to augment the pixel representations by aggregating the image-level and semantic-level contextual information.
arXiv Detail & Related papers (2021-08-27T16:38:22Z) - Mining Contextual Information Beyond Image for Semantic Segmentation [37.783233906684444]
The paper studies the context aggregation problem in semantic image segmentation.
It proposes to mine the contextual information beyond individual images to further augment the pixel representations.
The proposed method could be effortlessly incorporated into existing segmentation frameworks.
arXiv Detail & Related papers (2021-08-26T14:34:23Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.