Crafting Better Contrastive Views for Siamese Representation Learning
- URL: http://arxiv.org/abs/2202.03278v1
- Date: Mon, 7 Feb 2022 15:09:00 GMT
- Title: Crafting Better Contrastive Views for Siamese Representation Learning
- Authors: Xiangyu Peng, Kai Wang, Zheng Zhu, Yang You
- Abstract summary: We propose ContrastiveCrop, which could effectively generate better crops for Siamese representation learning.
A semantic-aware object localization strategy is proposed within the training process in a fully unsupervised manner.
As a plug-and-play and framework-agnostic module, ContrastiveCrop consistently improves SimCLR, MoCo, BYOL, SimSiam by 0.4% 2.0% classification accuracy.
- Score: 20.552194081238248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent self-supervised contrastive learning methods greatly benefit from the
Siamese structure that aims at minimizing distances between positive pairs. For
high performance Siamese representation learning, one of the keys is to design
good contrastive pairs. Most previous works simply apply random sampling to
make different crops of the same image, which overlooks the semantic
information that may degrade the quality of views. In this work, we propose
ContrastiveCrop, which could effectively generate better crops for Siamese
representation learning. Firstly, a semantic-aware object localization strategy
is proposed within the training process in a fully unsupervised manner. This
guides us to generate contrastive views which could avoid most false positives
(i.e., object vs. background). Moreover, we empirically find that views with
similar appearances are trivial for the Siamese model training. Thus, a
center-suppressed sampling is further designed to enlarge the variance of
crops. Remarkably, our method takes a careful consideration of positive pairs
for contrastive learning with negligible extra training overhead. As a
plug-and-play and framework-agnostic module, ContrastiveCrop consistently
improves SimCLR, MoCo, BYOL, SimSiam by 0.4% ~ 2.0% classification accuracy on
CIFAR-10, CIFAR-100, Tiny ImageNet and STL-10. Superior results are also
achieved on downstream detection and segmentation tasks when pre-trained on
ImageNet-1K.
Related papers
- Enhancing Contrastive Learning with Efficient Combinatorial Positive
Pairing [2.7961972519572442]
We propose a general multi-view strategy that can improve learning speed and performance of any contrastive or non-contrastive method.
In case of ImageNet-100, ECPP boosted SimCLR outperforms supervised learning.
arXiv Detail & Related papers (2024-01-11T08:18:30Z) - Hallucination Improves the Performance of Unsupervised Visual
Representation Learning [9.504503675097137]
We propose Hallucinator that could efficiently generate additional positive samples for further contrast.
The Hallucinator is differentiable and creates new data in the feature space.
Remarkably, we empirically prove that the proposed Hallucinator generalizes well to various contrastive learning models.
arXiv Detail & Related papers (2023-07-22T21:15:56Z) - Asymmetric Patch Sampling for Contrastive Learning [17.922853312470398]
Asymmetric appearance between positive pair effectively reduces the risk of representation degradation in contrastive learning.
We propose a novel asymmetric patch sampling strategy for contrastive learning, to boost the appearance asymmetry for better representations.
arXiv Detail & Related papers (2023-06-05T13:10:48Z) - mc-BEiT: Multi-choice Discretization for Image BERT Pre-training [52.04866462439979]
Image BERT pre-training with masked image modeling (MIM) is a popular practice to cope with self-supervised representation learning.
We introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives.
arXiv Detail & Related papers (2022-03-29T09:08:18Z) - Weakly Supervised Contrastive Learning [68.47096022526927]
We introduce a weakly supervised contrastive learning framework (WCL) to tackle this issue.
WCL achieves 65% and 72% ImageNet Top-1 Accuracy using ResNet50, which is even higher than SimCLRv2 with ResNet101.
arXiv Detail & Related papers (2021-10-10T12:03:52Z) - Improving Contrastive Learning by Visualizing Feature Transformation [37.548120912055595]
In this paper, we attempt to devise a feature-level data manipulation, differing from data augmentation, to enhance the generic contrastive self-supervised learning.
We first design a visualization scheme for pos/neg score (Pos/neg score indicates similarity of pos/neg pair.) distribution, which enables us to analyze, interpret and understand the learning process.
Experiment results show that our proposed Feature Transformation can improve at least 6.0% accuracy on ImageNet-100 over MoCo baseline, and about 2.0% accuracy on ImageNet-1K over the MoCoV2 baseline.
arXiv Detail & Related papers (2021-08-06T07:26:08Z) - With a Little Help from My Friends: Nearest-Neighbor Contrastive
Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification.
We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.