Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning
- URL: http://arxiv.org/abs/2408.10566v4
- Date: Fri, 27 Sep 2024 06:32:01 GMT
- Title: Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning
- Authors: Yuqing Zhao, Divya Saxena, Jiannong Cao, Xiaoyun Liu, Changlin Song,
- Abstract summary: In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks.
However, improper model growth can lead to severe degradation of previously learned knowledge, especially in task-agnostic CL using entire grown model for inference.
This paper presents a novel SparseGrow approach to overcome the issue of GIFt while enhancing adaptability over new data.
- Score: 9.91929539637026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue we name as growth-induced forgetting (GIFt), especially in task-agnostic CL using entire grown model for inference. Existing works, despite adopting model growth and random initialization for better adaptability, often fail to recognize the presence of GIFt caused by improper model growth. This oversight limits comprehensive control of forgetting and hinders full utilization of model growth. We are the first in CL to identify this issue and conduct an in-depth study on root cause of GIFt, where layer expansion stands out among model growth strategies, widening layers without affecting model functionality. Yet, direct adoption of layer expansion presents challenges. It lacks data-driven control and initialization of expanded parameters to balance adaptability and knowledge retention. This paper presents a novel SparseGrow approach to overcome the issue of GIFt while enhancing adaptability over new data. SparseGrow employs data-driven sparse layer expansion to control efficient parameter usage during growth, reducing GIFt from excessive growth and functionality changes. It also combines sparse growth with on-data initialization at training late-stage to create partially 0-valued expansions that fit learned distribution, enhancing retention and adaptability. To further minimize forgetting, freezing is applied by calculating the sparse mask, allowing data-driven preservation of important parameters. Through experiments across datasets with various settings, cases, and task numbers, we demonstrate the necessity of layer expansion and showcase the effectiveness of SparseGrow in overcoming GIFt, highlighting its adaptability and knowledge retention for incremental tasks.
Related papers
- SF(DA)$^2$: Source-free Domain Adaptation Through the Lens of Data Augmentation [35.071201249725426]
We propose Source-free Domain Adaptation Through the Lens of Data Augmentation (SF(DA)$2$), a novel approach that leverages the benefits of data augmentation without suffering from these challenges.
Our method shows superior adaptation performance in SFDA scenarios, including 2D image and 3D point cloud datasets and a highly imbalanced dataset.
arXiv Detail & Related papers (2024-03-16T07:05:47Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Efficient Expansion and Gradient Based Task Inference for Replay Free
Incremental Learning [5.760774528950479]
Recent expansion based models show promising results for task incremental learning (TIL)
For class incremental learning (CIL), prediction of task id is a crucial challenge.
We propose a robust task prediction method that leverages entropy weighted data augmentations and the models gradient using pseudo labels.
arXiv Detail & Related papers (2023-12-02T17:28:52Z) - Data-Centric Long-Tailed Image Recognition [49.90107582624604]
Long-tail models exhibit a strong demand for high-quality data.
Data-centric approaches aim to enhance both the quantity and quality of data to improve model performance.
There is currently a lack of research into the underlying mechanisms explaining the effectiveness of information augmentation.
arXiv Detail & Related papers (2023-11-03T06:34:37Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Masked Structural Growth for 2x Faster Language Model Pre-training [18.276784451675603]
We focus on speeding up pre-training by progressively growing from a small Transformer structure to a large one.
In terms of growth schedule, the impact of each single dimension on a schedule's efficiency is under-explored by existing work.
We propose Masked Structural Growth (MSG), including (i) growth schedules involving all possible dimensions and (ii) strictly function-preserving growth operators.
arXiv Detail & Related papers (2023-05-04T14:28:39Z) - A Guide for Practical Use of ADMG Causal Data Augmentation [0.0]
Causal data augmentation strategies have been pointed out as a solution to handle these challenges.
This paper experimentally analyzed the ADMG causal augmentation method considering different settings.
arXiv Detail & Related papers (2023-04-03T09:31:13Z) - AdaXpert: Adapting Neural Architecture for Growing Data [63.30393509048505]
In real-world applications, data often come in a growing manner, where the data volume and the number of classes may increase dynamically.
Given the increasing data volume or the number of classes, one has to instantaneously adjust the neural model capacity to obtain promising performance.
Existing methods either ignore the growing nature of data or seek to independently search an optimal architecture for a given dataset.
arXiv Detail & Related papers (2021-07-01T07:22:05Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Learnable Expansion-and-Compression Network for Few-shot
Class-Incremental Learning [87.94561000910707]
We propose a learnable expansion-and-compression network (LEC-Net) to solve catastrophic forgetting and model over-fitting problems.
LEC-Net enlarges the representation capacity of features, alleviating feature drift of old network from the perspective of model regularization.
Experiments on the CUB/CIFAR-100 datasets show that LEC-Net improves the baseline by 57% while outperforms the state-of-the-art by 56%.
arXiv Detail & Related papers (2021-04-06T04:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.