Weakly-Supervised Semantic Segmentation with Visual Words Learning and
Hybrid Pooling
- URL: http://arxiv.org/abs/2202.04812v1
- Date: Thu, 10 Feb 2022 03:19:08 GMT
- Title: Weakly-Supervised Semantic Segmentation with Visual Words Learning and
Hybrid Pooling
- Authors: Lixiang Ru and Bo Du and Yibing Zhan and Chen Wu
- Abstract summary: Weakly-Supervised Semantic Activation (WSSS) methods with image-level labels generally train a classification network to generate the Class Maps (CAMs) as the initial coarse segmentation labels.
These two problems are attributed to the sole image-level supervision and aggregation of global information when training the classification networks.
In this work, we propose the visual words learning module and hybrid pooling approach, and incorporate them in the classification network to mitigate the above problems.
- Score: 38.336345235423586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly-Supervised Semantic Segmentation (WSSS) methods with image-level
labels generally train a classification network to generate the Class
Activation Maps (CAMs) as the initial coarse segmentation labels. However,
current WSSS methods still perform far from satisfactorily because their
adopted CAMs 1) typically focus on partial discriminative object regions and 2)
usually contain useless background regions. These two problems are attributed
to the sole image-level supervision and aggregation of global information when
training the classification networks. In this work, we propose the visual words
learning module and hybrid pooling approach, and incorporate them in the
classification network to mitigate the above problems. In the visual words
learning module, we counter the first problem by enforcing the classification
network to learn fine-grained visual word labels so that more object extents
could be discovered. Specifically, the visual words are learned with a
codebook, which could be updated via two proposed strategies, i.e.
learning-based strategy and memory-bank strategy. The second drawback of CAMs
is alleviated with the proposed hybrid pooling, which incorporates the global
average and local discriminative information to simultaneously ensure object
completeness and reduce background regions. We evaluated our methods on PASCAL
VOC 2012 and MS COCO 2014 datasets. Without any extra saliency prior, our
method achieved 70.6% and 70.7% mIoU on the $val$ and $test$ set of PASCAL VOC
dataset, respectively, and 36.2% mIoU on the $val$ set of MS COCO dataset,
which significantly surpassed the performance of state-of-the-art WSSS methods.
Related papers
- DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation [8.422110274212503]
Weakly supervised semantic segmentation approaches typically rely on class activation maps (CAMs) for initial seed generation.
We introduce DALNet, which leverages text embeddings to enhance the comprehensive understanding and precise localization of objects across different levels of granularity.
Our approach, in particular, allows for more efficient end-to-end process as a single-stage method.
arXiv Detail & Related papers (2024-09-24T06:51:49Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary
Multi-Label Classification of CLIP Without Training [29.431698321195814]
Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification.
CLIP shows poor performance on multi-label datasets because the global feature tends to be dominated by the most prominent class.
We propose a local-to-global framework to obtain image tags.
arXiv Detail & Related papers (2023-12-20T08:15:40Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - Learning to Discover and Detect Objects [43.52208526783969]
We tackle the problem of novel class discovery, detection, and localization (NCDL)
In this setting, we assume a source dataset with labels for objects of commonly observed classes.
By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes.
arXiv Detail & Related papers (2022-10-19T17:59:55Z) - L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly
Supervised Semantic Segmentation [67.26984058377435]
We present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining.
Our framework conducts the global network to learn the captured rich object detail knowledge from a global view.
Experiments show that our method attains 72.1% and 44.2% mIoU scores on the validation set of PASCAL VOC 2012 and MS COCO 2014.
arXiv Detail & Related papers (2022-04-07T04:31:32Z) - Contrastive learning of Class-agnostic Activation Map for Weakly
Supervised Object Localization and Semantic Segmentation [32.76127086403596]
We propose Contrastive learning for Class-agnostic Activation Map (C$2$AM) generation using unlabeled image data.
We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background.
As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions.
arXiv Detail & Related papers (2022-03-25T08:46:24Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.