DisCo: Remedy Self-supervised Learning on Lightweight Models with
Distilled Contrastive Learning
- URL: http://arxiv.org/abs/2104.09124v1
- Date: Mon, 19 Apr 2021 08:22:52 GMT
- Title: DisCo: Remedy Self-supervised Learning on Lightweight Models with
Distilled Contrastive Learning
- Authors: Yuting Gao, Jia-Xin Zhuang, Ke Li, Hao Cheng, Xiaowei Guo, Feiyue
Huang, Rongrong Ji, Xing Sun
- Abstract summary: Self-supervised representation learning (SSL) has received widespread attention from the community.
Recent research argue that its performance will suffer a cliff fall when the model size decreases.
We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
- Score: 94.89221799550593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While self-supervised representation learning (SSL) has received widespread
attention from the community, recent research argue that its performance will
suffer a cliff fall when the model size decreases. The current method mainly
relies on contrastive learning to train the network and in this work, we
propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease
the issue by a large margin. Specifically, we find the final embedding obtained
by the mainstream SSL methods contains the most fruitful information, and
propose to distill the final embedding to maximally transmit a teacher's
knowledge to a lightweight model by constraining the last embedding of the
student to be consistent with that of the teacher. In addition, in the
experiment, we find that there exists a phenomenon termed Distilling BottleNeck
and present to enlarge the embedding dimension to alleviate this problem. Our
method does not introduce any extra parameter to lightweight models during
deployment. Experimental results demonstrate that our method achieves the
state-of-the-art on all lightweight models. Particularly, when
ResNet-101/ResNet-50 is used as teacher to teach EfficientNet-B0, the linear
result of EfficientNet-B0 on ImageNet is very close to ResNet-101/ResNet-50,
but the number of parameters of EfficientNet-B0 is only 9.4%/16.3% of
ResNet-101/ResNet-50.
Related papers
- Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning [0.0]
We propose textscRetro, which reuses the teacher's projection head for students.
Our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models.
arXiv Detail & Related papers (2024-05-24T07:53:09Z) - A Light-weight Deep Learning Model for Remote Sensing Image
Classification [70.66164876551674]
We present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC)
By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems.
arXiv Detail & Related papers (2023-02-25T09:02:01Z) - Establishing a stronger baseline for lightweight contrastive models [10.63129923292905]
Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks.
A common practice is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher.
In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model.
arXiv Detail & Related papers (2022-12-14T11:20:24Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints [81.46143788046892]
We focus on the task of controlling the level of sparsity when performing sparse learning.
Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor.
We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
arXiv Detail & Related papers (2022-08-08T21:24:20Z) - LilNetX: Lightweight Networks with EXtreme Model Compression and
Structured Sparsification [36.651329027209634]
LilNetX is an end-to-end trainable technique for neural networks.
It enables learning models with specified accuracy-rate-computation trade-off.
arXiv Detail & Related papers (2022-04-06T17:59:10Z) - Network Augmentation for Tiny Deep Learning [73.57192520534585]
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.
We demonstrate the effectiveness of NetAug on image classification and object detection.
arXiv Detail & Related papers (2021-10-17T18:48:41Z) - Learnable Expansion-and-Compression Network for Few-shot
Class-Incremental Learning [87.94561000910707]
We propose a learnable expansion-and-compression network (LEC-Net) to solve catastrophic forgetting and model over-fitting problems.
LEC-Net enlarges the representation capacity of features, alleviating feature drift of old network from the perspective of model regularization.
Experiments on the CUB/CIFAR-100 datasets show that LEC-Net improves the baseline by 57% while outperforms the state-of-the-art by 56%.
arXiv Detail & Related papers (2021-04-06T04:34:21Z) - Beyond Self-Supervision: A Simple Yet Effective Network Distillation
Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models.
Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model.
We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.