Related papers: Accelerated Inference and Reduced Forgetting: The Dual Benefits of Early-Exit Networks in Continual Learning

Accelerated Inference and Reduced Forgetting: The Dual Benefits of Early-Exit Networks in Continual Learning

URL: http://arxiv.org/abs/2403.07404v1
Date: Tue, 12 Mar 2024 08:33:26 GMT
Title: Accelerated Inference and Reduced Forgetting: The Dual Benefits of Early-Exit Networks in Continual Learning
Authors: Filip Szatkowski, Fei Yang, Bart{\l}omiej Twardowski, Tomasz Trzci\'nski, Joost van de Weijer
Abstract summary: Early-exit networks allow for swift predictions by making decisions early in the network, thereby conserving time and resources. This study aims to explore the continual learning of the early-exit networks. We propose Task-wise Logits Correction (TLC), a simple method that equalizes this bias and improves the network performance.
Score: 29.37826822806214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Driven by the demand for energy-efficient employment of deep neural networks, early-exit methods have experienced a notable increase in research attention. These strategies allow for swift predictions by making decisions early in the network, thereby conserving computation time and resources. However, so far the early-exit networks have only been developed for stationary data distributions, which restricts their application in real-world scenarios with continuous non-stationary data. This study aims to explore the continual learning of the early-exit networks. We adapt existing continual learning methods to fit with early-exit architectures and investigate their behavior in the continual setting. We notice that early network layers exhibit reduced forgetting and can outperform standard networks even when using significantly fewer resources. Furthermore, we analyze the impact of task-recency bias on early-exit inference and propose Task-wise Logits Correction (TLC), a simple method that equalizes this bias and improves the network performance for every given compute budget in the class-incremental setting. We assess the accuracy and computational cost of various continual learning techniques enhanced with early-exits and TLC across standard class-incremental learning benchmarks such as 10 split CIFAR100 and ImageNetSubset and show that TLC can achieve the accuracy of the standard methods using less than 70\% of their computations. Moreover, at full computational budget, our method outperforms the accuracy of the standard counterparts by up to 15 percentage points. Our research underscores the inherent synergy between early-exit networks and continual learning, emphasizing their practical utility in resource-constrained environments.

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
KAC: Kolmogorov-Arnold Classifier for Continual Learning [70.29494592027852]
Continual learning requires models to train continuously across consecutive tasks without forgetting. Most existing methods utilize linear classifiers, which struggle to maintain a stable classification space while learning new tasks. Inspired by the success of Kolmogorov-Arnold Networks (KAN) in preserving learning during simple continual regression tasks, we set out to explore their potential in more complex continual learning scenarios.
arXiv Detail & Related papers (2025-03-27T01:27:14Z)
Towards Efficient and General-Purpose Few-Shot Misclassification Detection for Vision-Language Models [25.51735861729728]
Modern neural networks often exhibit overconfidence for misclassified predictions, highlighting the need for confidence estimation to detect errors.<n>We exploit vision language model (VLM) leveraging text information to establish an efficient and general-purpose misclassification detection framework.<n>By harnessing the power of VLM, we construct FSMisD, a Few-Shot prompt learning framework for MisD to refrain from training from scratch and therefore improve tuning efficiency.
arXiv Detail & Related papers (2025-03-26T12:31:04Z)
Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network. Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks. In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge. We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
A Retention-Centric Framework for Continual Learning with Guaranteed Model Developmental Safety [75.8161094916476]
In real-world applications, learning-enabled systems often undergo iterative model development to address challenging or emerging tasks. New or improving existing capabilities may inadvertently lose good capabilities of the old model, also known as catastrophic forgetting. We propose a retention-centric framework with data-dependent constraints, and study how to continually develop a pretrained CLIP model for acquiring new or improving existing capabilities of image classification.
arXiv Detail & Related papers (2024-10-04T22:34:58Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task. We name our approach Adaptive Retention & Correction (ARC) ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z)
MIND: Multi-Task Incremental Network Distillation [45.74830585715129]
In this study, we present MIND, a parameter isolation method that aims to significantly enhance the performance of replay-free solutions. Our results showcase the superior performance of MIND indicating its potential for addressing the challenges posed by Class-incremental and Domain-Incremental learning.
arXiv Detail & Related papers (2023-12-05T17:46:52Z)
Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks. We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information. Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information. We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
Continual Learning with Pretrained Backbones by Tuning in the Input Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z)
Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning [23.15206507040553]
We propose Auxiliary Network Continual Learning (ANCL) to equip the neural network with the ability to learn the current task. ANCL applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability. More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability.
arXiv Detail & Related papers (2023-03-16T17:00:42Z)
New Insights on Relieving Task-Recency Bias for Online Class Incremental Learning [37.888061221999294]
In all settings, the online class incremental learning (OCIL) is more challenging and can be encountered more frequently in real world. To strike a preferable trade-off between stability and plasticity, we propose an Adaptive Focus Shifting algorithm.
arXiv Detail & Related papers (2023-02-16T11:52:00Z)
Mitigating Forgetting in Online Continual Learning via Contrasting Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one. Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z)
CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning. CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner. We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z)
Center Loss Regularization for Continual Learning [0.0]
In general, neural networks lack the ability to learn different tasks sequentially. Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks. We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.
arXiv Detail & Related papers (2021-10-21T17:46:44Z)
Improving Music Performance Assessment with Contrastive Learning [78.8942067357231]
This study investigates contrastive learning as a potential method to improve existing MPA systems. We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network. Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks.
arXiv Detail & Related papers (2021-08-03T19:24:25Z)
Ask-n-Learn: Active Learning via Reliable Gradient Representations for Image Classification [29.43017692274488]
Deep predictive models rely on human supervision in the form of labeled training data. We propose Ask-n-Learn, an active learning approach based on gradient embeddings obtained using the pesudo-labels estimated in each of the algorithm.
arXiv Detail & Related papers (2020-09-30T05:19:56Z)
Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space. We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.