Background-aware Classification Activation Map for Weakly Supervised
Object Localization
- URL: http://arxiv.org/abs/2112.14379v1
- Date: Wed, 29 Dec 2021 03:12:09 GMT
- Title: Background-aware Classification Activation Map for Weakly Supervised
Object Localization
- Authors: Lei Zhu, Qi She, Qian Chen, Xiangxi Meng, Mufeng Geng, Lujia Jin, Zhe
Jiang, Bin Qiu, Yunfei You, Yibao Zhang, Qiushi Ren, Yanye Lu
- Abstract summary: We propose a background-aware classification activation map (B-CAM) to simultaneously learn localization scores of both object and background.
Our B-CAM can be trained in end-to-end manner based on a proposed stagger classification loss.
Experiments show that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages and VOC2012 datasets.
- Score: 14.646874544729426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised object localization (WSOL) relaxes the requirement of dense
annotations for object localization by using image-level classification masks
to supervise its learning process. However, current WSOL methods suffer from
excessive activation of background locations and need post-processing to obtain
the localization mask. This paper attributes these issues to the unawareness of
background cues, and propose the background-aware classification activation map
(B-CAM) to simultaneously learn localization scores of both object and
background with only image-level labels. In our B-CAM, two image-level
features, aggregated by pixel-level features of potential background and object
locations, are used to purify the object feature from the object-related
background and to represent the feature of the pure-background sample,
respectively. Then based on these two features, both the object classifier and
the background classifier are learned to determine the binary object
localization mask. Our B-CAM can be trained in end-to-end manner based on a
proposed stagger classification loss, which not only improves the objects
localization but also suppresses the background activation. Experiments show
that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages
and VOC2012 datasets.
Related papers
- Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Rethinking the Localization in Weakly Supervised Object Localization [51.29084037301646]
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
arXiv Detail & Related papers (2023-08-11T14:38:51Z) - Spatial-Aware Token for Weakly Supervised Object Localization [137.0570026552845]
We propose a task-specific spatial-aware token to condition localization in a weakly supervised manner.
Experiments show that the proposed SAT achieves state-of-the-art performance on both CUB-200 and ImageNet, with 98.45% and 73.13% GT-known Loc.
arXiv Detail & Related papers (2023-03-18T15:38:17Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Re-Attention Transformer for Weakly Supervised Object Localization [45.417606565085116]
We present a re-attention mechanism termed token refinement transformer (TRT) that captures the object-level semantics to guide the localization well.
Specifically, TRT introduces a novel module named token priority scoring module (TPSM) to suppress the effects of background noise while focusing on the target object.
arXiv Detail & Related papers (2022-08-03T04:34:28Z) - Contrastive learning of Class-agnostic Activation Map for Weakly
Supervised Object Localization and Semantic Segmentation [32.76127086403596]
We propose Contrastive learning for Class-agnostic Activation Map (C$2$AM) generation using unlabeled image data.
We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background.
As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions.
arXiv Detail & Related papers (2022-03-25T08:46:24Z) - Shallow Feature Matters for Weakly Supervised Object Localization [35.478997006168484]
Weakly supervised object localization (WSOL) aims to localize objects by only utilizing image-level labels.
Previous CAM-based methods did not take full advantage of the shallow features, despite their importance for WSOL.
In this paper, we propose a simple but effective Shallow feature-aware Pseudo supervised Object localization (SPOL) model for accurate WSOL.
arXiv Detail & Related papers (2021-08-02T13:16:48Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.