Effective Self-supervised Pre-training on Low-compute Networks without
Distillation
- URL: http://arxiv.org/abs/2210.02808v2
- Date: Mon, 2 Oct 2023 20:29:21 GMT
- Title: Effective Self-supervised Pre-training on Low-compute Networks without
Distillation
- Authors: Fuwen Tan, Fatemeh Saleh, Brais Martinez
- Abstract summary: Reported performance of self-supervised learning has trailed behind standard supervised pre-training by a large margin.
Most prior works attribute this poor performance to the capacity bottleneck of the low-compute networks.
We take a closer at what are the detrimental factors causing the practical limitations, and whether they are intrinsic to the self-supervised low-compute setting.
- Score: 6.530011859253459
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the impressive progress of self-supervised learning (SSL), its
applicability to low-compute networks has received limited attention. Reported
performance has trailed behind standard supervised pre-training by a large
margin, barring self-supervised learning from making an impact on models that
are deployed on device. Most prior works attribute this poor performance to the
capacity bottleneck of the low-compute networks and opt to bypass the problem
through the use of knowledge distillation (KD). In this work, we revisit SSL
for efficient neural networks, taking a closer at what are the detrimental
factors causing the practical limitations, and whether they are intrinsic to
the self-supervised low-compute setting. We find that, contrary to accepted
knowledge, there is no intrinsic architectural bottleneck, we diagnose that the
performance bottleneck is related to the model complexity vs regularization
strength trade-off. In particular, we start by empirically observing that the
use of local views can have a dramatic impact on the effectiveness of the SSL
methods. This hints at view sampling being one of the performance bottlenecks
for SSL on low-capacity networks. We hypothesize that the view sampling
strategy for large neural networks, which requires matching views in very
diverse spatial scales and contexts, is too demanding for low-capacity
architectures. We systematize the design of the view sampling mechanism,
leading to a new training methodology that consistently improves the
performance across different SSL methods (e.g. MoCo-v2, SwAV, DINO), different
low-size networks (e.g. MobileNetV2, ResNet18, ResNet34, ViT-Ti), and different
tasks (linear probe, object detection, instance segmentation and
semi-supervised learning). Our best models establish a new state-of-the-art for
SSL methods on low-compute networks despite not using a KD loss term.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Plasticity-Optimized Complementary Networks for Unsupervised Continual
Learning [22.067640536948545]
Continuous unsupervised representation learning (CURL) research has greatly benefited from improvements in self-supervised learning (SSL) techniques.
Existing CURL methods using SSL can learn high-quality representations without any labels, but with a notable performance drop when learning on a many-tasks data stream.
We propose to train an expert network that is relieved of the duty of keeping the previous knowledge and can focus on performing optimally on the new tasks.
arXiv Detail & Related papers (2023-09-12T09:31:34Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - LowDINO -- A Low Parameter Self Supervised Learning Model [0.0]
This research aims to explore the possibility of designing a neural network architecture that allows for small networks to adopt the properties of huge networks.
Previous studies have shown that using convolutional neural networks (ConvNets) can provide inherent inductive bias.
To reduce the number of parameters, attention mechanisms are utilized through the usage of MobileViT blocks.
arXiv Detail & Related papers (2023-05-28T18:34:59Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach.
In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework.
The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z) - lpSpikeCon: Enabling Low-Precision Spiking Neural Network Processing for
Efficient Unsupervised Continual Learning on Autonomous Agents [14.916996986290902]
We propose lpSpikeCon, a novel methodology to enable low-precision SNN processing for efficient unsupervised continual learning.
Our lpSpikeCon can reduce weight memory of the SNN model by 8x (i.e., by judiciously employing 4-bit weights) for performing online training with unsupervised continual learning.
arXiv Detail & Related papers (2022-05-24T18:08:16Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.