Self-supervised Pre-training with Hard Examples Improves Visual
Representations
- URL: http://arxiv.org/abs/2012.13493v2
- Date: Mon, 4 Jan 2021 01:21:04 GMT
- Title: Self-supervised Pre-training with Hard Examples Improves Visual
Representations
- Authors: Chunyuan Li, Xiujun Li, Lei Zhang, Baolin Peng, Mingyuan Zhou,
Jianfeng Gao
- Abstract summary: Self-supervised pre-training (SSP) employs random image transformations to generate training data for visual representation learning.
We first present a modeling framework that unifies existing SSP methods as learning to predict pseudo-labels.
Then, we propose new data augmentation methods of generating training examples whose pseudo-labels are harder to predict than those generated via random image transformations.
- Score: 110.23337264762512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised pre-training (SSP) employs random image transformations to
generate training data for visual representation learning. In this paper, we
first present a modeling framework that unifies existing SSP methods as
learning to predict pseudo-labels. Then, we propose new data augmentation
methods of generating training examples whose pseudo-labels are harder to
predict than those generated via random image transformations. Specifically, we
use adversarial training and CutMix to create hard examples (HEXA) to be used
as augmented views for MoCo-v2 and DeepCluster-v2, leading to two variants
HEXA_{MoCo} and HEXA_{DCluster}, respectively. In our experiments, we pre-train
models on ImageNet and evaluate them on multiple public benchmarks. Our
evaluation shows that the two new algorithm variants outperform their original
counterparts, and achieve new state-of-the-art on a wide range of tasks where
limited task supervision is available for fine-tuning. These results verify
that hard examples are instrumental in improving the generalization of the
pre-trained models.
Related papers
- Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Beyond Random Augmentations: Pretraining with Hard Views [40.88518237601708]
Hard View Pretraining (HVP) is a learning-free strategy that exposes the model to harder, more challenging samples during SSL pretraining.
HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining.
arXiv Detail & Related papers (2023-10-05T23:09:19Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - Toward Learning Robust and Invariant Representations with Alignment
Regularization and Data Augmentation [76.85274970052762]
This paper is motivated by a proliferation of options of alignment regularizations.
We evaluate the performances of several popular design choices along the dimensions of robustness and invariance.
We also formally analyze the behavior of alignment regularization to complement our empirical study under assumptions we consider realistic.
arXiv Detail & Related papers (2022-06-04T04:29:19Z) - MixSiam: A Mixture-based Approach to Self-supervised Representation
Learning [33.52892899982186]
Recently contrastive learning has shown significant progress in learning visual representations from unlabeled data.
We propose MixSiam, a mixture-based approach upon the traditional siamese network.
arXiv Detail & Related papers (2021-11-04T08:12:47Z) - Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations [8.336315962271396]
We look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA)
MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models.
We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
arXiv Detail & Related papers (2021-06-15T10:49:46Z) - Adaptive Consistency Regularization for Semi-Supervised Transfer
Learning [31.66745229673066]
We consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm.
To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization.
Our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch.
arXiv Detail & Related papers (2021-03-03T05:46:39Z) - ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised
Image-Text Data [9.3935916515127]
We introduce a new vision-supervised pre-trained model -- ImageBERT -- for image-text joint embedding.
Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them.
arXiv Detail & Related papers (2020-01-22T11:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.