High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation
- URL: http://arxiv.org/abs/2304.02621v3
- Date: Fri, 9 Feb 2024 14:05:35 GMT
- Title: High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation
- Authors: Arvi Jonnarth, Yushan Zhang, Michael Felsberg
- Abstract summary: Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
- Score: 17.804090651425955
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image-level weakly-supervised semantic segmentation (WSSS) reduces the
usually vast data annotation cost by surrogate segmentation masks during
training. The typical approach involves training an image classification
network using global average pooling (GAP) on convolutional feature maps. This
enables the estimation of object locations based on class activation maps
(CAMs), which identify the importance of image regions. The CAMs are then used
to generate pseudo-labels, in the form of segmentation masks, to supervise a
segmentation model in the absence of pixel-level ground truth. Our work is
based on two techniques for improving CAMs; importance sampling, which is a
substitute for GAP, and the feature similarity loss, which utilizes a heuristic
that object contours almost always align with color edges in images. However,
both are based on the multinomial posterior with softmax, and implicitly assume
that classes are mutually exclusive, which turns out suboptimal in our
experiments. Thus, we reformulate both techniques based on binomial posteriors
of multiple independent binary problems. This has two benefits; their
performance is improved and they become more general, resulting in an add-on
method that can boost virtually any WSSS method. This is demonstrated on a wide
variety of baselines on the PASCAL VOC dataset, improving the region similarity
and contour quality of all implemented state-of-the-art methods. Experiments on
the MS COCO dataset further show that our proposed add-on is well-suited for
large-scale settings. Our code implementation is available at
https://github.com/arvijj/hfpl.
Related papers
- SSA-Seg: Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation [11.176993272867396]
In this paper, we propose a novel Semantic and Spatial Adaptive (SSA-Seg) to address the challenges of semantic segmentation.
Specifically, we employ the coarse masks obtained from the fixed prototypes as a guide to adjust the fixed prototype towards the center of the semantic and spatial domains in the test image.
Results show that the proposed SSA-Seg significantly improves the segmentation performance of the baseline models with only a minimal increase in computational cost.
arXiv Detail & Related papers (2024-05-10T15:14:23Z) - A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Deep Neural Networks Fused with Textures for Image Classification [20.58839604333332]
Fine-grained image classification is a challenging task in computer vision.
We propose a fusion approach to address FGIC by combining global texture with local patch-based information.
Our method has attained better classification accuracy over existing methods with notable margins.
arXiv Detail & Related papers (2023-08-03T15:21:08Z) - CorrMatch: Label Propagation via Correlation Matching for
Semi-Supervised Semantic Segmentation [73.89509052503222]
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch.
We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information.
We propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more.
Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps.
arXiv Detail & Related papers (2023-06-07T10:02:29Z) - Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations.
We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z) - Few-shot semantic segmentation via mask aggregation [5.886986014593717]
Few-shot semantic segmentation aims to recognize novel classes with only very few labelled data.
Previous works have typically regarded it as a pixel-wise classification problem.
We introduce a mask-based classification method for addressing this problem.
arXiv Detail & Related papers (2022-02-15T07:13:09Z) - Weakly-Supervised Semantic Segmentation with Visual Words Learning and
Hybrid Pooling [38.336345235423586]
Weakly-Supervised Semantic Activation (WSSS) methods with image-level labels generally train a classification network to generate the Class Maps (CAMs) as the initial coarse segmentation labels.
These two problems are attributed to the sole image-level supervision and aggregation of global information when training the classification networks.
In this work, we propose the visual words learning module and hybrid pooling approach, and incorporate them in the classification network to mitigate the above problems.
arXiv Detail & Related papers (2022-02-10T03:19:08Z) - Semantic Distribution-aware Contrastive Adaptation for Semantic
Segmentation [50.621269117524925]
Domain adaptive semantic segmentation refers to making predictions on a certain target domain with only annotations of a specific source domain.
We present a semantic distribution-aware contrastive adaptation algorithm that enables pixel-wise representation alignment.
We evaluate SDCA on multiple benchmarks, achieving considerable improvements over existing algorithms.
arXiv Detail & Related papers (2021-05-11T13:21:25Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.