Establishing a stronger baseline for lightweight contrastive models
- URL: http://arxiv.org/abs/2212.07158v2
- Date: Mon, 17 Jul 2023 15:45:19 GMT
- Title: Establishing a stronger baseline for lightweight contrastive models
- Authors: Wenye Lin, Yifeng Ding, Zhixiong Cao, Hai-tao Zheng
- Abstract summary: Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks.
A common practice is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher.
In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model.
- Score: 10.63129923292905
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recent research has reported a performance degradation in self-supervised
contrastive learning for specially designed efficient networks, such as
MobileNet and EfficientNet. A common practice to address this problem is to
introduce a pretrained contrastive teacher model and train the lightweight
networks with distillation signals generated by the teacher. However, it is
time and resource consuming to pretrain a teacher model when it is not
available. In this work, we aim to establish a stronger baseline for
lightweight contrastive models without using a pretrained teacher model.
Specifically, we show that the optimal recipe for efficient models is different
from that of larger models, and using the same training settings as ResNet50,
as previous research does, is inappropriate. Additionally, we observe a common
issu e in contrastive learning where either the positive or negative views can
be noisy, and propose a smoothed version of InfoNCE loss to alleviate this
problem. As a result, we successfully improve the linear evaluation results
from 36.3\% to 62.3\% for MobileNet-V3-Large and from 42.2\% to 65.8\% for
EfficientNet-B0 on ImageNet, closing the accuracy gap to ResNet50 with
$5\times$ fewer parameters. We hope our research will facilitate the usage of
lightweight contrastive models.
Related papers
- Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning [0.0]
We propose textscRetro, which reuses the teacher's projection head for students.
Our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models.
arXiv Detail & Related papers (2024-05-24T07:53:09Z) - A Light-weight Deep Learning Model for Remote Sensing Image
Classification [70.66164876551674]
We present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC)
By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems.
arXiv Detail & Related papers (2023-02-25T09:02:01Z) - LegoNet: A Fast and Exact Unlearning Architecture [59.49058450583149]
Machine unlearning aims to erase the impact of specific training samples upon deleted requests from a trained model.
We present a novel network, namely textitLegoNet, which adopts the framework of fixed encoder + multiple adapters''
We show that LegoNet accomplishes fast and exact unlearning while maintaining acceptable performance, synthetically outperforming unlearning baselines.
arXiv Detail & Related papers (2022-10-28T09:53:05Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - Network Augmentation for Tiny Deep Learning [73.57192520534585]
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.
We demonstrate the effectiveness of NetAug on image classification and object detection.
arXiv Detail & Related papers (2021-10-17T18:48:41Z) - DisCo: Remedy Self-supervised Learning on Lightweight Models with
Distilled Contrastive Learning [94.89221799550593]
Self-supervised representation learning (SSL) has received widespread attention from the community.
Recent research argue that its performance will suffer a cliff fall when the model size decreases.
We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
arXiv Detail & Related papers (2021-04-19T08:22:52Z) - Beyond Self-Supervision: A Simple Yet Effective Network Distillation
Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models.
Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model.
We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z) - SEED: Self-supervised Distillation For Visual Representation [34.63488756535054]
We propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion.
We show that SEED dramatically boosts the performance of small networks on downstream tasks.
arXiv Detail & Related papers (2021-01-12T20:04:50Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.