Beyond Random Augmentations: Pretraining with Hard Views
- URL: http://arxiv.org/abs/2310.03940v5
- Date: Mon, 27 May 2024 21:19:55 GMT
- Title: Beyond Random Augmentations: Pretraining with Hard Views
- Authors: Fabio Ferreira, Ivo Rapant, Jörg K. H. Franke, Frank Hutter,
- Abstract summary: Hard View Pretraining (HVP) is a learning-free strategy that exposes the model to harder, more challenging samples during SSL pretraining.
HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining.
- Score: 40.88518237601708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many Self-Supervised Learning (SSL) methods aim for model invariance to different image augmentations known as views. To achieve this invariance, conventional approaches make use of random sampling operations within the image augmentation pipeline. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple, yet effective approach is to select hard views that yield a higher loss. In this paper, we present Hard View Pretraining (HVP), a learning-free strategy that builds upon this hypothesis and extends random view generation. HVP exposes the model to harder, more challenging samples during SSL pretraining, which enhances downstream performance. It encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss depending on the current model state, and 4) run the backward pass with the selected pair. As a result, HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining and similar improvements on transfer tasks across DINO, SimSiam, iBOT, and SimCLR.
Related papers
- EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Virtual embeddings and self-consistency for self-supervised learning [43.086696088061416]
TriMix is a novel concept for self-supervised learning that generates virtual embeddings through linear data.
We validate TriMix on eight benchmark datasets with an improvement of 2.71% and 0.41% better than the second-best models for both data types.
arXiv Detail & Related papers (2022-06-13T10:20:28Z) - SPeCiaL: Self-Supervised Pretraining for Continual Learning [49.34919926042038]
SPeCiaL is a method for unsupervised pretraining of representations tailored for continual learning.
We evaluate SPeCiaL in the Continual Few-Shot Learning setting, and show that it can match or outperform other supervised pretraining approaches.
arXiv Detail & Related papers (2021-06-16T18:15:15Z) - Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
We propose a new jigsaw clustering pretext task in this paper.
Our method makes use of information from both intra- and inter-images.
It is even comparable to the contrastive learning methods when only half of training batches are used.
arXiv Detail & Related papers (2021-04-01T08:09:26Z) - Self-supervised Pre-training with Hard Examples Improves Visual
Representations [110.23337264762512]
Self-supervised pre-training (SSP) employs random image transformations to generate training data for visual representation learning.
We first present a modeling framework that unifies existing SSP methods as learning to predict pseudo-labels.
Then, we propose new data augmentation methods of generating training examples whose pseudo-labels are harder to predict than those generated via random image transformations.
arXiv Detail & Related papers (2020-12-25T02:44:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.