Big Self-Supervised Models are Strong Semi-Supervised Learners
- URL: http://arxiv.org/abs/2006.10029v2
- Date: Mon, 26 Oct 2020 03:09:28 GMT
- Title: Big Self-Supervised Models are Strong Semi-Supervised Learners
- Authors: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey
Hinton
- Abstract summary: We show that it is surprisingly effective for semi-supervised learning on ImageNet.
A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning.
We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network.
- Score: 116.00752519907725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One paradigm for learning from few labeled examples while making best use of
a large amount of unlabeled data is unsupervised pretraining followed by
supervised fine-tuning. Although this paradigm uses unlabeled data in a
task-agnostic way, in contrast to common approaches to semi-supervised learning
for computer vision, we show that it is surprisingly effective for
semi-supervised learning on ImageNet. A key ingredient of our approach is the
use of big (deep and wide) networks during pretraining and fine-tuning. We find
that, the fewer the labels, the more this approach (task-agnostic use of
unlabeled data) benefits from a bigger network. After fine-tuning, the big
network can be further improved and distilled into a much smaller one with
little loss in classification accuracy by using the unlabeled examples for a
second time, but in a task-specific way. The proposed semi-supervised learning
algorithm can be summarized in three steps: unsupervised pretraining of a big
ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples,
and distillation with unlabeled examples for refining and transferring the
task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy
with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50,
a $10\times$ improvement in label efficiency over the previous
state-of-the-art. With 10% of labels, ResNet-50 trained with our method
achieves 77.5% top-1 accuracy, outperforming standard supervised training with
all of the labels.
Related papers
- One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Weighted Distillation with Unlabeled Examples [15.825078347452024]
Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited.
This paper proposes a principled approach for addressing this issue based on a ''debiasing'' reweighting of the student's loss function tailored to the distillation training paradigm.
arXiv Detail & Related papers (2022-10-13T04:08:56Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Weakly-Supervised Semantic Segmentation by Learning Label Uncertainty [8.074019565026544]
We present a new loss function to train a segmentation network with only a small subset of pixel-perfect labels.
Our loss trains the network to learn a label uncertainty within the bounding-box, which can be leveraged to perform online bootstrapping.
We trained each task on a dataset comprised of only 18% pixel-perfect and 82% bounding-box labels.
arXiv Detail & Related papers (2021-10-12T12:19:22Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Don't Wait, Just Weight: Improving Unsupervised Representations by
Learning Goal-Driven Instance Weights [92.16372657233394]
Self-supervised learning techniques can boost performance by learning useful representations from unlabelled data.
We show that by learning Bayesian instance weights for the unlabelled data, we can improve the downstream classification accuracy.
Our method, BetaDataWeighter is evaluated using the popular self-supervised rotation prediction task on STL-10 and Visual Decathlon.
arXiv Detail & Related papers (2020-06-22T15:59:32Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.