RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning?
- URL: http://arxiv.org/abs/2111.12309v1
- Date: Wed, 24 Nov 2021 07:19:46 GMT
- Title: RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning?
- Authors: Yufei Xu, Qiming Zhang, Jing Zhang, Dacheng Tao
- Abstract summary: We propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL)
Specifically, given two different images, we randomly crop a region from each image with the same size and swap them to compose two new images together with the left regions.
RegionCL exploits those abundant pairs and helps the model distinguish the regions features from both canvas and paste views.
- Score: 76.16156833138038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised methods (SSL) have achieved significant success via
maximizing the mutual information between two augmented views, where cropping
is a popular augmentation technique. Cropped regions are widely used to
construct positive pairs, while the left regions after cropping have rarely
been explored in existing methods, although they together constitute the same
image instance and both contribute to the description of the category. In this
paper, we make the first attempt to demonstrate the importance of both regions
in cropping from a complete perspective and propose a simple yet effective
pretext task called Region Contrastive Learning (RegionCL). Specifically, given
two different images, we randomly crop a region (called the paste view) from
each image with the same size and swap them to compose two new images together
with the left regions (called the canvas view), respectively. Then, contrastive
pairs can be efficiently constructed according to the following simple
criteria, i.e., each view is (1) positive with views augmented from the same
original image and (2) negative with views augmented from other images. With
minor modifications to popular SSL methods, RegionCL exploits those abundant
pairs and helps the model distinguish the regions features from both canvas and
paste views, therefore learning better visual representations. Experiments on
ImageNet, MS COCO, and Cityscapes demonstrate that RegionCL improves MoCo v2,
DenseCL, and SimSiam by large margins and achieves state-of-the-art performance
on classification, detection, and segmentation tasks. The code will be
available at https://github.com/Annbless/RegionCL.git.
Related papers
- CLIM: Contrastive Language-Image Mosaic for Region Representation [58.05870131126816]
Contrastive Language-Image Mosaic (CLIM) is a novel approach for aligning region and text representations.
CLIM consistently improves different open-vocabulary object detection methods.
It can effectively enhance the region representation of vision-language models.
arXiv Detail & Related papers (2023-12-18T17:39:47Z) - Saliency Guided Contrastive Learning on Scene Images [71.07412958621052]
We leverage the saliency map derived from the model's output during learning to highlight discriminative regions and guide the whole contrastive learning.
Our method significantly improves the performance of self-supervised learning on scene images by +1.1, +4.3, +2.2 Top1 accuracy in ImageNet linear evaluation, Semi-supervised learning with 1% and 10% ImageNet labels, respectively.
arXiv Detail & Related papers (2023-02-22T15:54:07Z) - Region Embedding with Intra and Inter-View Contrastive Learning [29.141194278469417]
Unsupervised region representation learning aims to extract dense and effective features from unlabeled urban data.
Motivated by the success of contrastive learning for representation learning, we propose to leverage it for multi-view region representation learning.
We design the intra-view contrastive learning module which helps to learn distinguished region embeddings and the inter-view contrastive learning module which serves as a soft co-regularizer.
arXiv Detail & Related papers (2022-11-15T10:57:20Z) - Dense Siamese Network [86.23741104851383]
We present Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks.
It learns visual representations by maximizing the similarity between two views of one image with two types of consistency, i.e., pixel consistency and region consistency.
It surpasses state-of-the-art segmentation methods by 2.1 mIoU with 28% training costs.
arXiv Detail & Related papers (2022-03-21T15:55:23Z) - RegionCLIP: Region-based Language-Image Pretraining [94.29924084715316]
Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification.
We propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations.
Our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets.
arXiv Detail & Related papers (2021-12-16T18:39:36Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.