In-sample Contrastive Learning and Consistent Attention for Weakly
Supervised Object Localization
- URL: http://arxiv.org/abs/2009.12063v1
- Date: Fri, 25 Sep 2020 07:24:46 GMT
- Title: In-sample Contrastive Learning and Consistent Attention for Weakly
Supervised Object Localization
- Authors: Minsong Ki, Youngjung Uh, Wonyoung Lee, Hyeran Byun
- Abstract summary: Weakly supervised object localization (WSOL) aims to localize the target object using only the image-level supervision.
Recent methods encourage the model to activate feature maps over the entire object by dropping the most discriminative parts.
We consider the background as an important cue that guides the feature activation to cover the sophisticated object region.
- Score: 18.971497314227275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised object localization (WSOL) aims to localize the target
object using only the image-level supervision. Recent methods encourage the
model to activate feature maps over the entire object by dropping the most
discriminative parts. However, they are likely to induce excessive extension to
the backgrounds which leads to over-estimated localization. In this paper, we
consider the background as an important cue that guides the feature activation
to cover the sophisticated object region and propose contrastive attention
loss. The loss promotes similarity between foreground and its dropped version,
and, dissimilarity between the dropped version and background. Furthermore, we
propose foreground consistency loss that penalizes earlier layers producing
noisy attention regarding the later layer as a reference to provide them with a
sense of backgroundness. It guides the early layers to activate on objects
rather than locally distinctive backgrounds so that their attentions to be
similar to the later layer. For better optimizing the above losses, we use the
non-local attention blocks to replace channel-pooled attention leading to
enhanced attention maps considering the spatial similarity. Last but not least,
we propose to drop background regions in addition to the most discriminative
region. Our method achieves state-of-theart performance on CUB-200-2011 and
ImageNet benchmark datasets regarding top-1 localization accuracy and
MaxBoxAccV2, and we provide detailed analysis on our individual components. The
code will be publicly available online for reproducibility.
Related papers
- Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Rethinking the Localization in Weakly Supervised Object Localization [51.29084037301646]
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
arXiv Detail & Related papers (2023-08-11T14:38:51Z) - Re-Attention Transformer for Weakly Supervised Object Localization [45.417606565085116]
We present a re-attention mechanism termed token refinement transformer (TRT) that captures the object-level semantics to guide the localization well.
Specifically, TRT introduces a novel module named token priority scoring module (TPSM) to suppress the effects of background noise while focusing on the target object.
arXiv Detail & Related papers (2022-08-03T04:34:28Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Anti-Adversarially Manipulated Attributions for Weakly Supervised
Semantic Segmentation and Object Localization [31.69344455448125]
We present an attribution map of an image that is manipulated to increase the classification score produced by a classifier before the final softmax or sigmoid layer.
This manipulation is realized in an anti-adversarial manner, so that the original image is perturbed along pixel gradients in directions opposite to those used in an adversarial attack.
In addition, we introduce a new regularization procedure that inhibits the incorrect attribution of regions unrelated to the target object and the excessive concentration of attributions on a small region of the target object.
arXiv Detail & Related papers (2022-04-11T06:18:02Z) - Contrastive learning of Class-agnostic Activation Map for Weakly
Supervised Object Localization and Semantic Segmentation [32.76127086403596]
We propose Contrastive learning for Class-agnostic Activation Map (C$2$AM) generation using unlabeled image data.
We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background.
As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions.
arXiv Detail & Related papers (2022-03-25T08:46:24Z) - Location-Free Camouflage Generation Network [82.74353843283407]
Camouflage is a common visual phenomenon, which refers to hiding the foreground objects into the background images, making them briefly invisible to the human eye.
This paper proposes a novel Location-free Camouflage Generation Network (LCG-Net) that fuse high-level features of foreground and background image, and generate result by one inference.
Experiments show that our method has results as satisfactory as state-of-the-art in the single-appearance regions and are less likely to be completely invisible, but far exceed the quality of the state-of-the-art in the multi-appearance regions.
arXiv Detail & Related papers (2022-03-18T10:33:40Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Coarse- and Fine-grained Attention Network with Background-aware Loss
for Crowd Density Map Estimation [2.690502103971799]
CFANet is a novel method for generating high-quality crowd density maps and people count estimation.
We devise a from-coarse-to-fine progressive attention mechanism by integrating Crowd Region Recognizer (CRR) and Density Level Estimator (DLE) branch.
Our method can not only outperform previous state-of-the-art methods in terms of count accuracy but also improve the image quality of density maps as well as reduce the false recognition ratio.
arXiv Detail & Related papers (2020-11-07T08:05:54Z) - Rethinking Localization Map: Towards Accurate Object Perception with
Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision.
In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.