L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly
Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2204.03206v1
- Date: Thu, 7 Apr 2022 04:31:32 GMT
- Title: L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly
Supervised Semantic Segmentation
- Authors: Peng-Tao Jiang, Yuqi Yang, Qibin Hou, Yunchao Wei
- Abstract summary: We present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining.
Our framework conducts the global network to learn the captured rich object detail knowledge from a global view.
Experiments show that our method attains 72.1% and 44.2% mIoU scores on the validation set of PASCAL VOC 2012 and MS COCO 2014.
- Score: 67.26984058377435
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Mining precise class-aware attention maps, a.k.a, class activation maps, is
essential for weakly supervised semantic segmentation. In this paper, we
present L2G, a simple online local-to-global knowledge transfer framework for
high-quality object attention mining. We observe that classification models can
discover object regions with more details when replacing the input image with
its local patches. Taking this into account, we first leverage a local
classification network to extract attentions from multiple local patches
randomly cropped from the input image. Then, we utilize a global network to
learn complementary attention knowledge across multiple local attention maps
online. Our framework conducts the global network to learn the captured rich
object detail knowledge from a global view and thereby produces high-quality
attention maps that can be directly used as pseudo annotations for semantic
segmentation networks. Experiments show that our method attains 72.1% and 44.2%
mIoU scores on the validation set of PASCAL VOC 2012 and MS COCO 2014,
respectively, setting new state-of-the-art records. Code is available at
https://github.com/PengtaoJiang/L2G.
Related papers
- A Self-Training Framework Based on Multi-Scale Attention Fusion for
Weakly Supervised Semantic Segmentation [7.36778096476552]
We propose a self-training method that utilizes fused multi-scale class-aware attention maps.
We collect information from attention maps of different scales and obtain multi-scale attention maps.
We then apply denoising and reactivation strategies to enhance the potential regions and reduce noisy areas.
arXiv Detail & Related papers (2023-05-10T02:16:12Z) - Learning to Discover and Detect Objects [43.52208526783969]
We tackle the problem of novel class discovery, detection, and localization (NCDL)
In this setting, we assume a source dataset with labels for objects of commonly observed classes.
By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes.
arXiv Detail & Related papers (2022-10-19T17:59:55Z) - Cross-layer Attention Network for Fine-grained Visual Categorization [12.249254142531381]
Learning discnative representations for subtle localized details plays a significant role in Fine-grained Visual Categorization (FGVC)
We build a mutual refinement mechanism between the mid-level feature maps and the top-level feature map by our proposed Cross-layer Attention Network (CLAN)
Experimental results show our approach achieves state-of-the-art on three publicly available fine-grained recognition datasets.
arXiv Detail & Related papers (2022-10-17T06:57:51Z) - Weakly-Supervised Semantic Segmentation with Visual Words Learning and
Hybrid Pooling [38.336345235423586]
Weakly-Supervised Semantic Activation (WSSS) methods with image-level labels generally train a classification network to generate the Class Maps (CAMs) as the initial coarse segmentation labels.
These two problems are attributed to the sole image-level supervision and aggregation of global information when training the classification networks.
In this work, we propose the visual words learning module and hybrid pooling approach, and incorporate them in the classification network to mitigate the above problems.
arXiv Detail & Related papers (2022-02-10T03:19:08Z) - Region Semantically Aligned Network for Zero-Shot Learning [18.18665627472823]
We propose a Region Semantically Aligned Network (RSAN) which maps local features of unseen classes to their semantic attributes.
We obtain each attribute from a specific region of the output and exploit these attributes for recognition.
Experiments on several standard ZSL datasets reveal the benefit of the proposed RSAN method, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2021-10-14T03:23:40Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.