Related papers: Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning

Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning

URL: http://arxiv.org/abs/2104.10461v2
Date: Thu, 22 Apr 2021 07:45:31 GMT
Title: Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning
Authors: Arian Bakhtiarnia, Qi Zhang and Alexandros Iosifidis
Abstract summary: Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy. We introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning. Our method consistently improves the accuracy of early exits compared to the standard training approach.
Score: 88.17413955380262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying deep learning services for time-sensitive and resource-constrained settings such as IoT using edge computing systems is a challenging task that requires dynamic adjustment of inference time. Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy. To mitigate this cost, in this paper we introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning, a training strategy for neural networks that imitates human learning by sorting the training samples based on their difficulty and gradually introducing them to the network. Experiments on CIFAR-10 and CIFAR-100 datasets and various configurations of multi-exit architectures show that our method consistently improves the accuracy of early exits compared to the standard training approach.

Related papers

Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks [6.805997961535213]
Multiscale Gradient Descent (Multiscale-SGD) is a novel optimization approach that exploits coarse-to-fine training strategies to estimate the gradient at a fraction of the cost. We introduce a new class of learnable, scale-independent Mesh-Free Convolutions (MFCs) that ensure consistent gradient behavior across resolutions. Our results establish a new paradigm for the efficient training of deep networks, enabling practical scalability in high-resolution and multiscale learning tasks.
arXiv Detail & Related papers (2025-01-22T09:13:47Z)
Continual Learning with Pretrained Backbones by Tuning in the Input Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Learning Rate Curriculum [75.98230528486401]
We propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC) LeRaC uses a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs. We compare our approach with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach.
arXiv Detail & Related papers (2022-05-18T18:57:36Z)
Consistency Training of Multi-exit Architectures for Sensor Data [0.07614628596146598]
We present a novel and architecture-agnostic approach for robust training of multi-exit architectures termed consistent exit training. We leverage weak supervision to align model output with consistency training and jointly optimize dual-losses in a multi-task learning fashion over the exits in a network.
arXiv Detail & Related papers (2021-09-27T17:11:25Z)
CosSGD: Nonlinear Quantization for Communication-efficient Federated Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server. We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning. Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z)
Neuromodulated Neural Architectures with Local Error Signals for Memory-Constrained Online Continual Learning [4.2903672492917755]
We develop a biologically-inspired light weight neural network architecture that incorporates local learning and neuromodulation. We demonstrate the efficacy of our approach on both single task and continual learning setting.
arXiv Detail & Related papers (2020-07-16T07:41:23Z)
Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD. Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z)
Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data. We propose to speed up this process by exploiting subsets of training data at each incremental training step. Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.