CREAM: Weakly Supervised Object Localization via Class RE-Activation
Mapping
- URL: http://arxiv.org/abs/2205.13922v1
- Date: Fri, 27 May 2022 11:57:41 GMT
- Title: CREAM: Weakly Supervised Object Localization via Class RE-Activation
Mapping
- Authors: Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Rui-Wei Zhao, Tao Zhang,
Xuequan Lu, Shang Gao
- Abstract summary: Class RE-Activation Mapping (CREAM) is a clustering-based approach to boost the activation values of the integral object regions.
CREAM achieves the state-of-the-art performance on CUB, ILSVRC and OpenImages benchmark datasets.
- Score: 18.67907876709536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly Supervised Object Localization (WSOL) aims to localize objects with
image-level supervision. Existing works mainly rely on Class Activation Mapping
(CAM) derived from a classification model. However, CAM-based methods usually
focus on the most discriminative parts of an object (i.e., incomplete
localization problem). In this paper, we empirically prove that this problem is
associated with the mixup of the activation values between less discriminative
foreground regions and the background. To address it, we propose Class
RE-Activation Mapping (CREAM), a novel clustering-based approach to boost the
activation values of the integral object regions. To this end, we introduce
class-specific foreground and background context embeddings as cluster
centroids. A CAM-guided momentum preservation strategy is developed to learn
the context embeddings during training. At the inference stage, the
re-activation mapping is formulated as a parameter estimation problem under
Gaussian Mixture Model, which can be solved by deriving an unsupervised
Expectation-Maximization based soft-clustering algorithm. By simply integrating
CREAM into various WSOL approaches, our method significantly improves their
performance. CREAM achieves the state-of-the-art performance on CUB, ILSVRC and
OpenImages benchmark datasets. Code will be available at
https://github.com/Jazzcharles/CREAM.
Related papers
- ResCLIP: Residual Attention for Training-free Dense Vision-language Inference [27.551367463011008]
Cross-correlation of self-attention in CLIP's non-final layers also exhibits localization properties.
We propose the Residual Cross-correlation Self-attention (RCS) module, which leverages the cross-correlation self-attention from intermediate layers to remold the attention in the final block.
The RCS module effectively reorganizes spatial information, unleashing the localization potential within CLIP for dense vision-language inference.
arXiv Detail & Related papers (2024-11-24T14:14:14Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Refine and Represent: Region-to-Object Representation Learning [55.70715883351945]
We present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining.
R2O operates by training an encoder to dynamically refine region-based segments into object-centric masks.
After pretraining on ImageNet, R2O models are able to surpass existing state-of-the-art in unsupervised object segmentation.
arXiv Detail & Related papers (2022-08-25T01:44:28Z) - Bagging Regional Classification Activation Maps for Weakly Supervised
Object Localization [11.25759292976175]
BagCAMs is a plug-and-play mechanism to better project a well-trained classifier for the localization task.
Our BagCAMs adopts a proposed regional localizer generation strategy to define a set of regional localizers.
Experiments indicate that adopting our proposed BagCAMs can improve the performance of baseline WSOL methods.
arXiv Detail & Related papers (2022-07-16T03:03:01Z) - Sparse Instance Activation for Real-Time Instance Segmentation [72.23597664935684]
We propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.
SparseInst has extremely fast inference speed and achieves 40 FPS and 37.9 AP on the COCO benchmark.
arXiv Detail & Related papers (2022-03-24T03:15:39Z) - Background Activation Suppression for Weakly Supervised Object
Localization [11.31345656299108]
We argue for using activation value to achieve more efficient learning.
In this paper, we propose a Background Activation Suppression (BAS) method.
BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets.
arXiv Detail & Related papers (2021-12-01T15:53:40Z) - Unveiling the Potential of Structure-Preserving for Weakly Supervised
Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL.
In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network.
In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.