Region Similarity Representation Learning
- URL: http://arxiv.org/abs/2103.12902v1
- Date: Wed, 24 Mar 2021 00:42:37 GMT
- Title: Region Similarity Representation Learning
- Authors: Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor
Darrell
- Abstract summary: Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
- Score: 94.88055458257081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Region Similarity Representation Learning (ReSim), a new approach
to self-supervised representation learning for localization-based tasks such as
object detection and segmentation. While existing work has largely focused on
solely learning global representations for an entire image, ReSim learns both
regional representations for localization as well as semantic image-level
representations. ReSim operates by sliding a fixed-sized window across the
overlapping area between two views (e.g., image crops), aligning these areas
with their corresponding convolutional feature map regions, and then maximizing
the feature similarity across views. As a result, ReSim learns spatially and
semantically consistent feature representation throughout the convolutional
feature maps of a neural network. A shift or scale of an image region, e.g., a
shift or scale of an object, has a corresponding change in the feature maps;
this allows downstream tasks to leverage these representations for
localization. Through object detection, instance segmentation, and dense pose
estimation experiments, we illustrate how ReSim learns representations which
significantly improve the localization and classification performance compared
to a competitive MoCo-v2 baseline: $+2.7$ AP$^{\text{bb}}_{75}$ VOC, $+1.1$
AP$^{\text{bb}}_{75}$ COCO, and $+1.9$ AP$^{\text{mk}}$ Cityscapes. Code and
pre-trained models are released at: \url{https://github.com/Tete-Xiao/ReSim}
Related papers
- High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - VICRegL: Self-Supervised Learning of Local Visual Features [34.92750644059916]
This paper explores the fundamental trade-off between learning local and global features.
A new method called VICRegL is proposed that learns good global and local features simultaneously.
We demonstrate strong performance on linear classification and segmentation transfer tasks.
arXiv Detail & Related papers (2022-10-04T12:54:25Z) - Refine and Represent: Region-to-Object Representation Learning [55.70715883351945]
We present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining.
R2O operates by training an encoder to dynamically refine region-based segments into object-centric masks.
After pretraining on ImageNet, R2O models are able to surpass existing state-of-the-art in unsupervised object segmentation.
arXiv Detail & Related papers (2022-08-25T01:44:28Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning? [76.16156833138038]
We propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL)
Specifically, given two different images, we randomly crop a region from each image with the same size and swap them to compose two new images together with the left regions.
RegionCL exploits those abundant pairs and helps the model distinguish the regions features from both canvas and paste views.
arXiv Detail & Related papers (2021-11-24T07:19:46Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - Unsupervised Learning of Dense Visual Representations [14.329781842154281]
We propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations.
VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions.
Our method outperforms ImageNet supervised pretraining in multiple dense prediction tasks.
arXiv Detail & Related papers (2020-11-11T01:28:11Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.