Self-supervised Knowledge Distillation for Few-shot Learning
- URL: http://arxiv.org/abs/2006.09785v2
- Date: Tue, 4 Aug 2020 05:22:39 GMT
- Title: Self-supervised Knowledge Distillation for Few-shot Learning
- Authors: Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,
Mubarak Shah
- Abstract summary: Few shot learning is a promising learning paradigm due to its ability to learn out of order distributions quickly with only a few samples.
We propose a simple approach to improve the representation capacity of deep neural networks for few-shot learning tasks.
Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods.
- Score: 123.10294801296926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world contains an overwhelmingly large number of object classes,
learning all of which at once is infeasible. Few shot learning is a promising
learning paradigm due to its ability to learn out of order distributions
quickly with only a few samples. Recent works [7, 41] show that simply learning
a good feature embedding can outperform more sophisticated meta-learning and
metric learning algorithms for few-shot learning. In this paper, we propose a
simple approach to improve the representation capacity of deep neural networks
for few-shot learning tasks. We follow a two-stage learning process: First, we
train a neural network to maximize the entropy of the feature embedding, thus
creating an optimal output manifold using a self-supervised auxiliary loss. In
the second stage, we minimize the entropy on feature embedding by bringing
self-supervised twins together, while constraining the manifold with
student-teacher distillation. Our experiments show that, even in the first
stage, self-supervision can outperform current state-of-the-art methods, with
further gains achieved by our second stage distillation process. Our codes are
available at: https://github.com/brjathu/SKD.
Related papers
- MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - Learning Deep Representation with Energy-Based Self-Expressiveness for
Subspace Clustering [24.311754971064303]
We propose a new deep subspace clustering framework, motivated by the energy-based models.
Considering the powerful representation ability of the recently popular self-supervised learning, we attempt to leverage self-supervised representation learning to learn the dictionary.
arXiv Detail & Related papers (2021-10-28T11:51:08Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Self-Supervised Training Enhances Online Continual Learning [37.91734641808391]
In continual learning, a system must incrementally learn from a non-stationary data stream without catastrophic forgetting.
Self-supervised pre-training could yield features that generalize better than supervised learning.
Our best system achieves a 14.95% relative increase in top-1 accuracy on class incremental ImageNet over the prior state of the art for online continual learning.
arXiv Detail & Related papers (2021-03-25T17:45:27Z) - Building One-Shot Semi-supervised (BOSS) Learning up to Fully Supervised
Performance [0.0]
We show the potential for building one-shot semi-supervised (BOSS) learning on Cifar-10 and SVHN.
Our method combines class prototype refining, class balancing, and self-training.
Rigorous empirical evaluations provide evidence that labeling large datasets is not necessary for training deep neural networks.
arXiv Detail & Related papers (2020-06-16T17:56:00Z) - Rethinking Few-Shot Image Classification: a Good Embedding Is All You
Need? [72.00712736992618]
We show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, outperforms state-of-the-art few-shot learning methods.
An additional boost can be achieved through the use of self-distillation.
We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms.
arXiv Detail & Related papers (2020-03-25T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.