Accelerated Inference and Reduced Forgetting: The Dual Benefits of
Early-Exit Networks in Continual Learning
- URL: http://arxiv.org/abs/2403.07404v1
- Date: Tue, 12 Mar 2024 08:33:26 GMT
- Title: Accelerated Inference and Reduced Forgetting: The Dual Benefits of
Early-Exit Networks in Continual Learning
- Authors: Filip Szatkowski, Fei Yang, Bart{\l}omiej Twardowski, Tomasz
Trzci\'nski, Joost van de Weijer
- Abstract summary: Early-exit networks allow for swift predictions by making decisions early in the network, thereby conserving time and resources.
This study aims to explore the continual learning of the early-exit networks.
We propose Task-wise Logits Correction (TLC), a simple method that equalizes this bias and improves the network performance.
- Score: 29.37826822806214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Driven by the demand for energy-efficient employment of deep neural networks,
early-exit methods have experienced a notable increase in research attention.
These strategies allow for swift predictions by making decisions early in the
network, thereby conserving computation time and resources. However, so far the
early-exit networks have only been developed for stationary data distributions,
which restricts their application in real-world scenarios with continuous
non-stationary data. This study aims to explore the continual learning of the
early-exit networks. We adapt existing continual learning methods to fit with
early-exit architectures and investigate their behavior in the continual
setting. We notice that early network layers exhibit reduced forgetting and can
outperform standard networks even when using significantly fewer resources.
Furthermore, we analyze the impact of task-recency bias on early-exit inference
and propose Task-wise Logits Correction (TLC), a simple method that equalizes
this bias and improves the network performance for every given compute budget
in the class-incremental setting. We assess the accuracy and computational cost
of various continual learning techniques enhanced with early-exits and TLC
across standard class-incremental learning benchmarks such as 10 split CIFAR100
and ImageNetSubset and show that TLC can achieve the accuracy of the standard
methods using less than 70\% of their computations. Moreover, at full
computational budget, our method outperforms the accuracy of the standard
counterparts by up to 15 percentage points. Our research underscores the
inherent synergy between early-exit networks and continual learning,
emphasizing their practical utility in resource-constrained environments.
Related papers
- Early Detection of Network Service Degradation: An Intra-Flow Approach [0.0]
This research presents a novel method for predicting service degradation (SD) in computer networks by leveraging early flow features.
Our approach focuses on the observable (O) segments of network flows, particularly analyzing Packet Inter-Arrival Time (PIAT)
We identify an optimal O/NO split threshold of 10 observed delay samples, balancing prediction accuracy and resource utilization.
arXiv Detail & Related papers (2024-07-09T08:05:14Z) - Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm [87.47506806135746]
In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning.
This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system.
It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off.
arXiv Detail & Related papers (2024-04-01T00:21:11Z) - Improving the Accuracy of Early Exits in Multi-Exit Architectures via
Curriculum Learning [88.17413955380262]
Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy.
We introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning.
Our method consistently improves the accuracy of early exits compared to the standard training approach.
arXiv Detail & Related papers (2021-04-21T11:12:35Z) - Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput.
Until today, no such learning-based algorithms have shown practical potential in this domain.
We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks.
We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - Dense for the Price of Sparse: Improved Performance of Sparsely
Initialized Networks via a Subspace Offset [0.0]
We introduce a new DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining.
Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.
arXiv Detail & Related papers (2021-02-12T00:05:02Z) - NetCut: Real-Time DNN Inference Using Layer Removal [8.762815575594395]
TRimmed Networks (TRNs) are based on removing problem-specific features of a pretrained network used in transfer learning.
NetCut, a methodology based on an empirical or an analytical latency estimator, only proposes and retrains TRNs that can meet the application's deadline.
arXiv Detail & Related papers (2021-01-13T22:02:43Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.