Related papers: CompRess: Self-Supervised Learning by Compressing Representations

CompRess: Self-Supervised Learning by Compressing Representations

URL: http://arxiv.org/abs/2010.14713v1
Date: Wed, 28 Oct 2020 02:49:18 GMT
Title: CompRess: Self-Supervised Learning by Compressing Representations
Authors: Soroush Abbasi Koohpayegani, Ajinkya Tejankar, and Hamed Pirsiavash
Abstract summary: We develop a model compression method to compress an already learned, deep self-supervised model (teacher) to a smaller one (student) We train the student model so that it mimics the relative similarity between the data points in the teacher's embedding space. This is the first time a self-supervised AlexNet has outperformed supervised one on ImageNet classification.
Score: 14.739041141948032
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from self-supervised learning than smaller models. As a result, the gap between supervised and self-supervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for self-supervised learning, we develop a model compression method to compress an already learned, deep self-supervised model (teacher) to a smaller one (student). We train the student model so that it mimics the relative similarity between the data points in the teacher's embedding space. For AlexNet, our method outperforms all previous methods including the fully supervised model on ImageNet linear evaluation (59.0% compared to 56.5%) and on nearest neighbor evaluation (50.7% compared to 41.4%). To the best of our knowledge, this is the first time a self-supervised AlexNet has outperformed supervised one on ImageNet classification. Our code is available here: https://github.com/UMBCvision/CompRess

Related papers

Establishing a stronger baseline for lightweight contrastive models [10.63129923292905]
Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks. A common practice is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher. In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model.
arXiv Detail & Related papers (2022-12-14T11:20:24Z)
Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network. Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space. Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z)
Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z)
Network Augmentation for Tiny Deep Learning [73.57192520534585]
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. We demonstrate the effectiveness of NetAug on image classification and object detection.
arXiv Detail & Related papers (2021-10-17T18:48:41Z)
Unsupervised Representation Learning for 3D Point Cloud Data [66.92077180228634]
We propose a simple yet effective approach for unsupervised point cloud learning. In particular, we identify a very useful transformation which generates a good contrastive version of an original point cloud. We conduct experiments on three downstream tasks which are 3D object classification, shape part segmentation and scene segmentation.
arXiv Detail & Related papers (2021-10-13T10:52:45Z)
Bag of Instances Aggregation Boosts Self-supervised Learning [122.61914701794296]
We propose a simple but effective distillation strategy for unsupervised learning. Our method, termed as BINGO, targets at transferring the relationship learned by the teacher to the student. BINGO achieves new state-of-the-art performance on small scale models.
arXiv Detail & Related papers (2021-07-04T17:33:59Z)
Distill on the Go: Online knowledge distillation in self-supervised learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models. We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation. Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z)
DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning [94.89221799550593]
Self-supervised representation learning (SSL) has received widespread attention from the community. Recent research argue that its performance will suffer a cliff fall when the model size decreases. We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
arXiv Detail & Related papers (2021-04-19T08:22:52Z)
SEED: Self-supervised Distillation For Visual Representation [34.63488756535054]
We propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. We show that SEED dramatically boosts the performance of small networks on downstream tasks.
arXiv Detail & Related papers (2021-01-12T20:04:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.