DSPNet: Towards Slimmable Pretrained Networks based on Discriminative
Self-supervised Learning
- URL: http://arxiv.org/abs/2207.06075v1
- Date: Wed, 13 Jul 2022 09:32:54 GMT
- Title: DSPNet: Towards Slimmable Pretrained Networks based on Discriminative
Self-supervised Learning
- Authors: Shaoru Wang, Zeming Li, Jin Gao, Liang Li, Weiming Hu
- Abstract summary: We propose Discriminative-SSL-based Slimmable Pretrained Networks (DSPNet)
DSPNet can be trained at once and then slimmed to multiple sub-networks of various sizes.
We show comparable or improved performance of DSPNet on ImageNet to the networks individually pretrained.
- Score: 43.45674911425684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has achieved promising downstream performance.
However, when facing various resource budgets in real-world applications, it
costs a huge computation burden to pretrain multiple networks of various sizes
one by one. In this paper, we propose Discriminative-SSL-based Slimmable
Pretrained Networks (DSPNet), which can be trained at once and then slimmed to
multiple sub-networks of various sizes, each of which faithfully learns good
representation and can serve as good initialization for downstream tasks with
various resource budgets. Specifically, we extend the idea of slimmable
networks to a discriminative SSL paradigm, by integrating SSL and knowledge
distillation gracefully. We show comparable or improved performance of DSPNet
on ImageNet to the networks individually pretrained one by one under the linear
evaluation and semi-supervised evaluation protocols, while reducing large
training cost. The pretrained models also generalize well on downstream
detection and segmentation tasks. Code will be made public.
Related papers
- One is More: Diverse Perspectives within a Single Network for Efficient
DRL [43.249133438809125]
We introduce OMNet, a novel learning paradigm utilizing multipleworks within a single network, offering diverse outputs efficiently.
OMNet can be easily applied to various deep reinforcement learning algorithms with minimal additional overhead.
arXiv Detail & Related papers (2023-10-21T13:37:13Z) - Effective Self-supervised Pre-training on Low-compute Networks without
Distillation [6.530011859253459]
Reported performance of self-supervised learning has trailed behind standard supervised pre-training by a large margin.
Most prior works attribute this poor performance to the capacity bottleneck of the low-compute networks.
We take a closer at what are the detrimental factors causing the practical limitations, and whether they are intrinsic to the self-supervised low-compute setting.
arXiv Detail & Related papers (2022-10-06T10:38:07Z) - Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
Learning in Speech and Audio [19.865050806327147]
Self-supervised learning has proven vital in speech and audio-related applications.
This paper provides the first empirical study of SSL pre-training for different specified sequence lengths.
We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks.
arXiv Detail & Related papers (2022-09-30T16:35:42Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet).
Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones.
We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Self-Supervised Learning for Binary Networks by Joint Classifier
Training [11.612308609123566]
We propose a self-supervised learning method for binary networks.
For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training.
Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.
arXiv Detail & Related papers (2021-10-17T15:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.