Mining Contextual Information Beyond Image for Semantic Segmentation
- URL: http://arxiv.org/abs/2108.11819v1
- Date: Thu, 26 Aug 2021 14:34:23 GMT
- Title: Mining Contextual Information Beyond Image for Semantic Segmentation
- Authors: Zhenchao Jin, Tao Gong, Dongdong Yu, Qi Chu, Jian Wang, Changhu Wang,
Jie Shao
- Abstract summary: The paper studies the context aggregation problem in semantic image segmentation.
It proposes to mine the contextual information beyond individual images to further augment the pixel representations.
The proposed method could be effortlessly incorporated into existing segmentation frameworks.
- Score: 37.783233906684444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the context aggregation problem in semantic image
segmentation. The existing researches focus on improving the pixel
representations by aggregating the contextual information within individual
images. Though impressive, these methods neglect the significance of the
representations of the pixels of the corresponding class beyond the input
image. To address this, this paper proposes to mine the contextual information
beyond individual images to further augment the pixel representations. We first
set up a feature memory module, which is updated dynamically during training,
to store the dataset-level representations of various categories. Then, we
learn class probability distribution of each pixel representation under the
supervision of the ground-truth segmentation. At last, the representation of
each pixel is augmented by aggregating the dataset-level representations based
on the corresponding class probability distribution. Furthermore, by utilizing
the stored dataset-level representations, we also propose a representation
consistent learning strategy to make the classification head better address
intra-class compactness and inter-class dispersion. The proposed method could
be effortlessly incorporated into existing segmentation frameworks (e.g., FCN,
PSPNet, OCRNet and DeepLabV3) and brings consistent performance improvements.
Mining contextual information beyond image allows us to report state-of-the-art
performance on various benchmarks: ADE20K, LIP, Cityscapes and COCO-Stuff.
Related papers
- Pixel-Level Clustering Network for Unsupervised Image Segmentation [3.69853388955692]
We present a pixel-level clustering framework for segmenting images into regions without using ground truth annotations.
We also propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images.
arXiv Detail & Related papers (2023-10-24T23:06:29Z) - Class-level Multiple Distributions Representation are Necessary for
Semantic Segmentation [9.796689408601775]
We introduce for the first time to describe intra-class variations by multiple distributions.
We also propose a class multiple distributions consistency strategy to construct discriminative multiple distribution representations of embedded pixels.
Our approach can be seamlessly integrated into popular segmentation frameworks FCN/PSPNet/CCNet and achieve 5.61%/1.75%/0.75% mIoU improvements on ADE20K.
arXiv Detail & Related papers (2023-03-14T16:10:36Z) - MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic
Segmentation [29.458735435545048]
We propose a novel soft mining contextual information beyond image paradigm named MCIBI++.
We generate a class probability distribution for each pixel representation and conduct the dataset-level context aggregation.
In the inference phase, we additionally design a coarse-to-fine iterative inference strategy to further boost the segmentation results.
arXiv Detail & Related papers (2022-09-09T18:03:52Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - ISNet: Integrate Image-Level and Semantic-Level Context for Semantic
Segmentation [64.56511597220837]
Co-occurrent visual pattern makes aggregating contextual information a common paradigm to enhance the pixel representation for semantic image segmentation.
Existing approaches focus on modeling the context from the perspective of the whole image, i.e., aggregating the image-level contextual information.
This paper proposes to augment the pixel representations by aggregating the image-level and semantic-level contextual information.
arXiv Detail & Related papers (2021-08-27T16:38:22Z) - Scaling Up Visual and Vision-Language Representation Learning With Noisy
Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps.
A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss.
We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.