Beyond Random Augmentations: Pretraining with Hard Views
- URL: http://arxiv.org/abs/2310.03940v5
- Date: Mon, 27 May 2024 21:19:55 GMT
- Title: Beyond Random Augmentations: Pretraining with Hard Views
- Authors: Fabio Ferreira, Ivo Rapant, Jörg K. H. Franke, Frank Hutter,
- Abstract summary: Hard View Pretraining (HVP) is a learning-free strategy that exposes the model to harder, more challenging samples during SSL pretraining.
HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining.
- Score: 40.88518237601708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many Self-Supervised Learning (SSL) methods aim for model invariance to different image augmentations known as views. To achieve this invariance, conventional approaches make use of random sampling operations within the image augmentation pipeline. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple, yet effective approach is to select hard views that yield a higher loss. In this paper, we present Hard View Pretraining (HVP), a learning-free strategy that builds upon this hypothesis and extends random view generation. HVP exposes the model to harder, more challenging samples during SSL pretraining, which enhances downstream performance. It encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss depending on the current model state, and 4) run the backward pass with the selected pair. As a result, HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining and similar improvements on transfer tasks across DINO, SimSiam, iBOT, and SimCLR.
Related papers
- Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning [49.275450836604726]
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training.
We employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input.
arXiv Detail & Related papers (2024-09-16T15:10:07Z) - FSL-Rectifier: Rectify Outliers in Few-Shot Learning via Test-Time Augmentation [7.477118370563593]
Few-shot-learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training.
We generate additional test-class samples by combining original samples with suitable train-class samples via a generative image combiner.
We obtain averaged features via an augmentor, which leads to more typical representations through the averaging.
arXiv Detail & Related papers (2024-02-28T12:37:30Z) - SelectAugment: Hierarchical Deterministic Sample Selection for Data
Augmentation [72.58308581812149]
We propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner.
Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio.
In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved.
arXiv Detail & Related papers (2021-12-06T08:38:38Z) - MixSiam: A Mixture-based Approach to Self-supervised Representation
Learning [33.52892899982186]
Recently contrastive learning has shown significant progress in learning visual representations from unlabeled data.
We propose MixSiam, a mixture-based approach upon the traditional siamese network.
arXiv Detail & Related papers (2021-11-04T08:12:47Z) - SPeCiaL: Self-Supervised Pretraining for Continual Learning [49.34919926042038]
SPeCiaL is a method for unsupervised pretraining of representations tailored for continual learning.
We evaluate SPeCiaL in the Continual Few-Shot Learning setting, and show that it can match or outperform other supervised pretraining approaches.
arXiv Detail & Related papers (2021-06-16T18:15:15Z) - Self-supervised Pre-training with Hard Examples Improves Visual
Representations [110.23337264762512]
Self-supervised pre-training (SSP) employs random image transformations to generate training data for visual representation learning.
We first present a modeling framework that unifies existing SSP methods as learning to predict pseudo-labels.
Then, we propose new data augmentation methods of generating training examples whose pseudo-labels are harder to predict than those generated via random image transformations.
arXiv Detail & Related papers (2020-12-25T02:44:22Z) - Unsupervised Learning of Visual Features by Contrasting Cluster
Assignments [57.33699905852397]
We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
arXiv Detail & Related papers (2020-06-17T14:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.