Related papers: SpaceNet: Make Free Space For Continual Learning

SpaceNet: Make Free Space For Continual Learning

URL: http://arxiv.org/abs/2007.07617v3
Date: Wed, 14 Apr 2021 08:39:33 GMT
Title: SpaceNet: Make Free Space For Continual Learning
Authors: Ghada Sokar, Decebal Constantin Mocanu, Mykola Pechenizkiy
Abstract summary: We propose a novel architectural-based method referred as SpaceNet for class incremental learning scenario. SpaceNet trains sparse deep neural networks from scratch in an adaptive way that compresses the sparse connections of each task in a compact number of neurons. Experimental results show the robustness of our proposed method against catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing the available capacity of the model.
Score: 15.914199054779438
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The continual learning (CL) paradigm aims to enable neural networks to learn tasks continually in a sequential fashion. The fundamental challenge in this learning paradigm is catastrophic forgetting previously learned tasks when the model is optimized for a new task, especially when their data is not accessible. Current architectural-based methods aim at alleviating the catastrophic forgetting problem but at the expense of expanding the capacity of the model. Regularization-based methods maintain a fixed model capacity; however, previous studies showed the huge performance degradation of these methods when the task identity is not available during inference (e.g. class incremental learning scenario). In this work, we propose a novel architectural-based method referred as SpaceNet for class incremental learning scenario where we utilize the available fixed capacity of the model intelligently. SpaceNet trains sparse deep neural networks from scratch in an adaptive way that compresses the sparse connections of each task in a compact number of neurons. The adaptive training of the sparse connections results in sparse representations that reduce the interference between the tasks. Experimental results show the robustness of our proposed method against catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing the available capacity of the model, leaving space for more tasks to be learned. In particular, when SpaceNet is tested on the well-known benchmarks for CL: split MNIST, split Fashion-MNIST, and CIFAR-10/100, it outperforms regularization-based methods by a big performance gap. Moreover, it achieves better performance than architectural-based methods without model expansion and achieved comparable results with rehearsal-based methods, while offering a huge memory reduction.

Related papers

LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation [20.851925464903804]
This paper introduces a novel learning paradigm, Dynamic Sparse Learning, tailored for recommendation models. DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance. Our experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance.
arXiv Detail & Related papers (2024-02-05T10:16:20Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS) Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher. Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information. We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
Complementary Learning Subnetworks for Parameter-Efficient Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks. Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z)
Dense Network Expansion for Class Incremental Learning [61.00081795200547]
State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task. A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity. It outperforms the previous SOTA methods by a margin of 4% in terms of accuracy, with similar or even smaller model scale.
arXiv Detail & Related papers (2023-03-22T16:42:26Z)
Neural Weight Search for Scalable Task Incremental Learning [6.413209417643468]
Task incremental learning aims to enable a system to maintain its performance on previously learned tasks while learning new tasks, solving the problem of catastrophic forgetting. One promising approach is to build an individual network or sub-network for future tasks. This leads to an ever-growing memory due to saving extra weights for new tasks and how to address this issue has remained an open problem in task incremental learning.
arXiv Detail & Related papers (2022-11-24T23:30:23Z)
Task-Adaptive Neural Network Retrieval with Meta-Contrastive Learning [34.27089256930098]
We propose a novel neural network retrieval method, which retrieves the most optimal pre-trained network for a given task. We train this framework by meta-learning a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a network. We validate the efficacy of our method on ten real-world datasets, against existing NAS baselines.
arXiv Detail & Related papers (2021-03-02T06:30:51Z)
Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks. We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI) In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z)
Neuromodulated Neural Architectures with Local Error Signals for Memory-Constrained Online Continual Learning [4.2903672492917755]
We develop a biologically-inspired light weight neural network architecture that incorporates local learning and neuromodulation. We demonstrate the efficacy of our approach on both single task and continual learning setting.
arXiv Detail & Related papers (2020-07-16T07:41:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.