Related papers: Revised Regularization for Efficient Continual Learning through Correlation-Based Parameter Update in Bayesian Neural Networks

Revised Regularization for Efficient Continual Learning through Correlation-Based Parameter Update in Bayesian Neural Networks

URL: http://arxiv.org/abs/2411.14202v1
Date: Thu, 21 Nov 2024 15:11:02 GMT
Title: Revised Regularization for Efficient Continual Learning through Correlation-Based Parameter Update in Bayesian Neural Networks
Authors: Sanchar Palit, Biplab Banerjee, Subhasis Chaudhuri,
Abstract summary: In continual learning scenarios, storing network parameters at each step to retain knowledge poses challenges. Current methods using Variational Inference with KL divergence risk catastrophic forgetting during uncertain node updates. We propose a parameter distribution learning method that significantly reduces the storage requirements.
Score: 20.00857639162206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a Bayesian neural network-based continual learning algorithm using Variational Inference, aiming to overcome several drawbacks of existing methods. Specifically, in continual learning scenarios, storing network parameters at each step to retain knowledge poses challenges. This is compounded by the crucial need to mitigate catastrophic forgetting, particularly given the limited access to past datasets, which complicates maintaining correspondence between network parameters and datasets across all sessions. Current methods using Variational Inference with KL divergence risk catastrophic forgetting during uncertain node updates and coupled disruptions in certain nodes. To address these challenges, we propose the following strategies. To reduce the storage of the dense layer parameters, we propose a parameter distribution learning method that significantly reduces the storage requirements. In the continual learning framework employing variational inference, our study introduces a regularization term that specifically targets the dynamics and population of the mean and variance of the parameters. This term aims to retain the benefits of KL divergence while addressing related challenges. To ensure proper correspondence between network parameters and the data, our method introduces an importance-weighted Evidence Lower Bound term to capture data and parameter correlations. This enables storage of common and distinctive parameter hyperspace bases. The proposed method partitions the parameter space into common and distinctive subspaces, with conditions for effective backward and forward knowledge transfer, elucidating the network-parameter dataset correspondence. The experimental results demonstrate the effectiveness of our method across diverse datasets and various combinations of sequential datasets, yielding superior performance compared to existing approaches.

Related papers

EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
Dynamic Continual Learning: Harnessing Parameter Uncertainty for Improved Network Adaptation [0.0]
We propose using parameter-based uncertainty to determine which parameters are relevant to a network's learned function. We show improved Continual Learning performance for Average Test Accuracy and Backward Transfer metrics.
arXiv Detail & Related papers (2025-01-18T19:58:53Z)
Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles [11.551956337460982]
We discuss function space inference via particle optimization and present practical modifications that improve uncertainty estimation. In this work, we demonstrate that the input samples, where particle predictions are enforced to be diverse, are detrimental to the model performance. While diversity on training data itself can lead to underfitting, the use of label-destroying data augmentation, or unlabeled out-of-distribution data can improve prediction diversity and uncertainty estimates.
arXiv Detail & Related papers (2024-12-20T10:24:08Z)
Replacement Learning: Training Vision Tasks with Fewer Learnable Parameters [4.2114456503277315]
Replacement Learning replaces all parameters of frozen layers with only two learnable parameters. We conduct experiments across four benchmark datasets, including CIFAR-10, STL-10, SVHN, and ImageNet. Our approach reduces the number of parameters, training time, and memory consumption while completely surpassing the performance of end-to-end training.
arXiv Detail & Related papers (2024-10-02T05:03:54Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Function-space Parameterization of Neural Networks for Sequential Learning [22.095632118886225]
Sequential learning paradigms pose challenges for gradient-based deep learning due to difficulties incorporating new data and retaining prior knowledge. We introduce a technique that converts neural networks from weight space to function space, through a dual parameterization. Our experiments demonstrate that we can retain knowledge in continual learning and incorporate new data efficiently.
arXiv Detail & Related papers (2024-03-16T14:00:04Z)
Continual Learning via Sequential Function-Space Variational Inference [65.96686740015902]
We propose an objective derived by formulating continual learning as sequential function-space variational inference. Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions. We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods.
arXiv Detail & Related papers (2023-12-28T18:44:32Z)
Sparse Function-space Representation of Neural Networks [23.4128813752424]
Deep neural networks (NNs) are known to lack uncertainty estimates and struggle to incorporate new data. We present a method that mitigates these issues by converting NNs from weight space to function space, via a dual parameterization.
arXiv Detail & Related papers (2023-09-05T12:56:35Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset. Our method involves two separate latent subspaces for the target property and the remaining input information. We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z)
Intrusion Detection using Spatial-Temporal features based on Riemannian Manifold [1.14219428942199]
Network traffic data is a combination of different data bytes packets under different network protocols. These traffic packets have complex time-varying non-linear relationships. Existing state-of-the-art methods rise up to this challenge by fusing features into multiple subsets based on correlations. This often requires high computational cost and manual support that limit them for real-time processing of network traffic.
arXiv Detail & Related papers (2021-10-31T23:50:59Z)
Efficient Continual Adaptation for Generative Adversarial Networks [97.20244383723853]
We present a continual learning approach for generative adversarial networks (GANs) Our approach is based on learning a set of global and task-specific parameters. We show that the feature-map transformation based approach outperforms state-of-the-art continual GANs methods.
arXiv Detail & Related papers (2021-03-06T05:09:37Z)
Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems. The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.