Inter-Image Communication for Weakly Supervised Localization
- URL: http://arxiv.org/abs/2008.05096v1
- Date: Wed, 12 Aug 2020 04:14:11 GMT
- Title: Inter-Image Communication for Weakly Supervised Localization
- Authors: Xiaolin Zhang, Yunchao Wei, Yi Yang
- Abstract summary: Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
- Score: 77.2171924626778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised localization aims at finding target object regions using
only image-level supervision. However, localization maps extracted from
classification networks are often not accurate due to the lack of fine
pixel-level supervision. In this paper, we propose to leverage pixel-level
similarities across different objects for learning more accurate object
locations in a complementary way. Particularly, two kinds of constraints are
proposed to prompt the consistency of object features within the same
categories. The first constraint is to learn the stochastic feature consistency
among discriminative pixels that are randomly sampled from different images
within a batch. The discriminative information embedded in one image can be
leveraged to benefit its counterpart with inter-image communication. The second
constraint is to learn the global consistency of object features throughout the
entire dataset. We learn a feature center for each category and realize the
global feature consistency by forcing the object features to approach
class-specific centers. The global centers are actively updated with the
training process. The two constraints can benefit each other to learn
consistent pixel-level features within the same categories, and finally improve
the quality of localization maps. We conduct extensive experiments on two
popular benchmarks, i.e., ILSVRC and CUB-200-2011. Our method achieves the
Top-1 localization error rate of 45.17% on the ILSVRC validation set,
surpassing the current state-of-the-art method by a large margin. The code is
available at https://github.com/xiaomengyc/I2C.
Related papers
- Keypoint-Augmented Self-Supervised Learning for Medical Image
Segmentation with Limited Annotation [21.203307064937142]
We present a keypointaugmented fusion layer that extracts representations preserving both short- and long-range self-attention.
In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial selfattention.
Our method further outperforms existing SSL methods by producing more robust self-attention.
arXiv Detail & Related papers (2023-10-02T22:31:30Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - VICRegL: Self-Supervised Learning of Local Visual Features [34.92750644059916]
This paper explores the fundamental trade-off between learning local and global features.
A new method called VICRegL is proposed that learns good global and local features simultaneously.
We demonstrate strong performance on linear classification and segmentation transfer tasks.
arXiv Detail & Related papers (2022-10-04T12:54:25Z) - DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor
Points [15.953570826460869]
Establishing dense correspondence between two images is a fundamental computer vision problem.
We introduce DenseGAP, a new solution for efficient Dense correspondence learning with a Graph-structured neural network conditioned on Anchor Points.
Our method advances the state-of-the-art of correspondence learning on most benchmarks.
arXiv Detail & Related papers (2021-12-13T18:59:30Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Re-rank Coarse Classification with Local Region Enhanced Features for
Fine-Grained Image Recognition [22.83821575990778]
We re-rank the TopN classification results by using the local region enhanced embedding features to improve the Top1 accuracy.
To learn more effective semantic global features, we design a multi-level loss over an automatically constructed hierarchical category structure.
Our method achieves state-of-the-art performance on three benchmarks: CUB-200-2011, Stanford Cars, and FGVC Aircraft.
arXiv Detail & Related papers (2021-02-19T11:30:25Z) - DetCo: Unsupervised Contrastive Learning for Object Detection [64.22416613061888]
Unsupervised contrastive learning achieves great success in learning image representations with CNN.
We present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches.
DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule.
arXiv Detail & Related papers (2021-02-09T12:47:20Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.