Unifying Synergies between Self-supervised Learning and Dynamic
Computation
- URL: http://arxiv.org/abs/2301.09164v3
- Date: Sat, 9 Sep 2023 20:43:13 GMT
- Title: Unifying Synergies between Self-supervised Learning and Dynamic
Computation
- Authors: Tarun Krishna, Ayush K Rai, Alexandru Drimbarean, Eric Arazo, Paul
Albert, Alan F Smeaton, Kevin McGuinness, Noel E O'Connor
- Abstract summary: We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
- Score: 53.66628188936682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computationally expensive training strategies make self-supervised learning
(SSL) impractical for resource constrained industrial settings. Techniques like
knowledge distillation (KD), dynamic computation (DC), and pruning are often
used to obtain a lightweightmodel, which usually involves multiple epochs of
fine-tuning (or distilling steps) of a large pre-trained model, making it more
computationally challenging. In this work we present a novel perspective on the
interplay between SSL and DC paradigms. In particular, we show that it is
feasible to simultaneously learn a dense and gated sub-network from scratch in
a SSL setting without any additional fine-tuning or pruning steps. The
co-evolution during pre-training of both dense and gated encoder offers a good
accuracy-efficiency trade-off and therefore yields a generic and multi-purpose
architecture for application specific industrial settings. Extensive
experiments on several image classification benchmarks including CIFAR-10/100,
STL-10 and ImageNet-100, demonstrate that the proposed training strategy
provides a dense and corresponding gated sub-network that achieves on-par
performance compared with the vanilla self-supervised setting, but at a
significant reduction in computation in terms of FLOPs, under a range of target
budgets (td ).
Related papers
- Federated Split Learning with Model Pruning and Gradient Quantization in Wireless Networks [7.439160287320074]
Federated split learning (FedSL) implements collaborative training across the edge devices and the server through model splitting.
We propose a lightweight FedSL scheme, that further alleviates the training burden on resource-constrained edge devices.
We conduct theoretical analysis to quantify the convergence performance of the proposed scheme.
arXiv Detail & Related papers (2024-12-09T11:43:03Z) - Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task [0.0]
We introduce an approach that combines saliency-guided training with quantization techniques to create an interpretable and resource-efficient model.
Our results demonstrate that the combined use of saliency-guided training and PACT-based quantization not only maintains classification performance but also produces models that are significantly more efficient and interpretable.
arXiv Detail & Related papers (2024-12-05T06:34:06Z) - Active Data Curation Effectively Distills Large-Scale Multimodal Models [66.23057263509027]
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones.
In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining.
Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations.
arXiv Detail & Related papers (2024-11-27T18:50:15Z) - Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - When Computing Power Network Meets Distributed Machine Learning: An
Efficient Federated Split Learning Framework [6.871107511111629]
CPN-FedSL is a Federated Split Learning (FedSL) framework over Computing Power Network (CPN)
We build a dedicated model to capture the basic settings and learning characteristics (e.g., latency, flow, convergence)
arXiv Detail & Related papers (2023-05-22T12:36:52Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Effective Self-supervised Pre-training on Low-compute Networks without
Distillation [6.530011859253459]
Reported performance of self-supervised learning has trailed behind standard supervised pre-training by a large margin.
Most prior works attribute this poor performance to the capacity bottleneck of the low-compute networks.
We take a closer at what are the detrimental factors causing the practical limitations, and whether they are intrinsic to the self-supervised low-compute setting.
arXiv Detail & Related papers (2022-10-06T10:38:07Z) - DANCE: DAta-Network Co-optimization for Efficient Segmentation Model
Training and Inference [85.02494022662505]
DANCE is an automated simultaneous data-network co-optimization for efficient segmentation model training and inference.
It integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images' spatial complexity.
Experiments and ablating studies demonstrate that DANCE can achieve "all-win" towards efficient segmentation.
arXiv Detail & Related papers (2021-07-16T04:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.