Beyond Random Augmentations: Pretraining with Hard Views
        - URL: http://arxiv.org/abs/2310.03940v5
- Date: Mon, 27 May 2024 21:19:55 GMT
- Title: Beyond Random Augmentations: Pretraining with Hard Views
- Authors: Fabio Ferreira, Ivo Rapant, Jörg K. H. Franke, Frank Hutter, 
- Abstract summary: Hard View Pretraining (HVP) is a learning-free strategy that exposes the model to harder, more challenging samples during SSL pretraining.
HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining.
- Score: 40.88518237601708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Many Self-Supervised Learning (SSL) methods aim for model invariance to different image augmentations known as views. To achieve this invariance, conventional approaches make use of random sampling operations within the image augmentation pipeline. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple, yet effective approach is to select hard views that yield a higher loss. In this paper, we present Hard View Pretraining (HVP), a learning-free strategy that builds upon this hypothesis and extends random view generation. HVP exposes the model to harder, more challenging samples during SSL pretraining, which enhances downstream performance. It encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss depending on the current model state, and 4) run the backward pass with the selected pair. As a result, HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining and similar improvements on transfer tasks across DINO, SimSiam, iBOT, and SimCLR. 
 
      
        Related papers
        - Should VLMs be Pre-trained with Image Data? [54.50406730361859]
 We find that pre-training with a mixture of image and text data allows models to perform better on vision-language tasks.
On an average of 6 diverse tasks, we find that for a 1B model, introducing visual tokens 80% of the way through pre-training results in a 2% average improvement over introducing visual tokens to a fully pre-trained model.
 arXiv  Detail & Related papers  (2025-03-10T17:58:19Z)
- Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning [49.275450836604726]
 We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training.
We employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input.
 arXiv  Detail & Related papers  (2024-09-16T15:10:07Z)
- FSL-Rectifier: Rectify Outliers in Few-Shot Learning via Test-Time   Augmentation [7.477118370563593]
 Few-shot-learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training.
We generate additional test-class samples by combining original samples with suitable train-class samples via a generative image combiner.
We obtain averaged features via an augmentor, which leads to more typical representations through the averaging.
 arXiv  Detail & Related papers  (2024-02-28T12:37:30Z)
- EfficientTrain: Exploring Generalized Curriculum Learning for Training
  Visual Backbones [80.662250618795]
 This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
 arXiv  Detail & Related papers  (2022-11-17T17:38:55Z)
- Virtual embeddings and self-consistency for self-supervised learning [43.086696088061416]
 TriMix is a novel concept for self-supervised learning that generates virtual embeddings through linear data.
We validate TriMix on eight benchmark datasets with an improvement of 2.71% and 0.41% better than the second-best models for both data types.
 arXiv  Detail & Related papers  (2022-06-13T10:20:28Z)
- SelectAugment: Hierarchical Deterministic Sample Selection for Data
  Augmentation [72.58308581812149]
 We propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner.
Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio.
In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved.
 arXiv  Detail & Related papers  (2021-12-06T08:38:38Z)
- MixSiam: A Mixture-based Approach to Self-supervised Representation
  Learning [33.52892899982186]
 Recently contrastive learning has shown significant progress in learning visual representations from unlabeled data.
We propose MixSiam, a mixture-based approach upon the traditional siamese network.
 arXiv  Detail & Related papers  (2021-11-04T08:12:47Z)
- SPeCiaL: Self-Supervised Pretraining for Continual Learning [49.34919926042038]
 SPeCiaL is a method for unsupervised pretraining of representations tailored for continual learning.
We evaluate SPeCiaL in the Continual Few-Shot Learning setting, and show that it can match or outperform other supervised pretraining approaches.
 arXiv  Detail & Related papers  (2021-06-16T18:15:15Z)
- Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
 We propose a new jigsaw clustering pretext task in this paper.
Our method makes use of information from both intra- and inter-images.
It is even comparable to the contrastive learning methods when only half of training batches are used.
 arXiv  Detail & Related papers  (2021-04-01T08:09:26Z)
- Self-supervised Pre-training with Hard Examples Improves Visual
  Representations [110.23337264762512]
 Self-supervised pre-training (SSP) employs random image transformations to generate training data for visual representation learning.
We first present a modeling framework that unifies existing SSP methods as learning to predict pseudo-labels.
Then, we propose new data augmentation methods of generating training examples whose pseudo-labels are harder to predict than those generated via random image transformations.
 arXiv  Detail & Related papers  (2020-12-25T02:44:22Z)
- Unsupervised Learning of Visual Features by Contrasting Cluster
  Assignments [57.33699905852397]
 We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
 arXiv  Detail & Related papers  (2020-06-17T14:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.