Revisiting Weakly Supervised Pre-Training of Visual Perception Models
- URL: http://arxiv.org/abs/2201.08371v1
- Date: Thu, 20 Jan 2022 18:55:06 GMT
- Title: Revisiting Weakly Supervised Pre-Training of Visual Perception Models
- Authors: Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis,
Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr
Doll\'ar, Laurens van der Maaten
- Abstract summary: Large-scale weakly supervised pre-training can outperform fully supervised approaches.
This paper revisits weakly-supervised pre-training of models using hashtag supervision.
Our results provide a compelling argument for the use of weakly supervised learning in the development of visual recognition systems.
- Score: 27.95816470075203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model pre-training is a cornerstone of modern visual recognition systems.
Although fully supervised pre-training on datasets like ImageNet is still the
de-facto standard, recent studies suggest that large-scale weakly supervised
pre-training can outperform fully supervised approaches. This paper revisits
weakly-supervised pre-training of models using hashtag supervision with modern
versions of residual networks and the largest-ever dataset of images and
corresponding hashtags. We study the performance of the resulting models in
various transfer-learning settings including zero-shot transfer. We also
compare our models with those obtained via large-scale self-supervised
learning. We find our weakly-supervised models to be very competitive across
all settings, and find they substantially outperform their self-supervised
counterparts. We also include an investigation into whether our models learned
potentially troubling associations or stereotypes. Overall, our results provide
a compelling argument for the use of weakly supervised learning in the
development of visual recognition systems. Our models, Supervised Weakly
through hashtAGs (SWAG), are available publicly.
Related papers
- Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Self-Supervised Models are Continual Learners [79.70541692930108]
We show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for Continual Learning.
We devise a framework for Continual self-supervised visual representation Learning that significantly improves the quality of the learned representations.
arXiv Detail & Related papers (2021-12-08T10:39:13Z) - Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations [8.336315962271396]
We look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA)
MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models.
We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
arXiv Detail & Related papers (2021-06-15T10:49:46Z) - Learning by Distillation: A Self-Supervised Learning Framework for
Optical Flow Estimation [71.76008290101214]
DistillFlow is a knowledge distillation approach to learning optical flow.
It achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets.
Our models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark.
arXiv Detail & Related papers (2021-06-08T09:13:34Z) - The Lottery Tickets Hypothesis for Supervised and Self-supervised
Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation.
Recent studies suggest that pre-training benefits from gigantic model capacity.
In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z) - How Well Do Self-Supervised Models Transfer? [92.16372657233394]
We evaluate the transfer performance of 13 top self-supervised models on 40 downstream tasks.
We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition.
No single self-supervised method dominates overall, suggesting that universal pre-training is still unsolved.
arXiv Detail & Related papers (2020-11-26T16:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.