Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation
- URL: http://arxiv.org/abs/2310.08855v1
- Date: Fri, 13 Oct 2023 04:50:40 GMT
- Title: Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation
- Authors: Yilin Lyu, Liyuan Wang, Xingxing Zhang, Zicheng Sun, Hang Su, Jun Zhu,
Liping Jing
- Abstract summary: Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
- Score: 67.77048565738728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning entails learning a sequence of tasks and balancing their
knowledge appropriately. With limited access to old training samples, much of
the current work in deep neural networks has focused on overcoming catastrophic
forgetting of old tasks in gradient-based optimization. However, the
normalization layers provide an exception, as they are updated interdependently
by the gradient and statistics of currently observed training samples, which
require specialized strategies to mitigate recency bias. In this work, we focus
on the most popular Batch Normalization (BN) and provide an in-depth
theoretical analysis of its sub-optimality in continual learning. Our analysis
demonstrates the dilemma between balance and adaptation of BN statistics for
incremental tasks, which potentially affects training stability and
generalization. Targeting on these particular challenges, we propose Adaptive
Balance of BN (AdaB$^2$N), which incorporates appropriately a Bayesian-based
strategy to adapt task-wise contributions and a modified momentum to balance BN
statistics, corresponding to the training and testing stages. By implementing
BN in a continual learning fashion, our approach achieves significant
performance gains across a wide range of benchmarks, particularly for the
challenging yet realistic online scenarios (e.g., up to 7.68%, 6.86% and 4.26%
on Split CIFAR-10, Split CIFAR-100 and Split Mini-ImageNet, respectively). Our
code is available at https://github.com/lvyilin/AdaB2N.
Related papers
- Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - Unified Batch Normalization: Identifying and Alleviating the Feature
Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design.
We propose a two-stage unified framework called Unified Batch Normalization (UBN)
UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z) - An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks.
We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z) - Test-time Batch Normalization [61.292862024903584]
Deep neural networks often suffer the data distribution shift between training and testing.
We revisit the batch normalization (BN) in the training process and reveal two key insights benefiting test-time optimization.
We propose a novel test-time BN layer design, GpreBN, which is optimized during testing by minimizing Entropy loss.
arXiv Detail & Related papers (2022-05-20T14:33:39Z) - Continual Normalization: Rethinking Batch Normalization for Online
Continual Learning [21.607816915609128]
We study the cross-task normalization effect of Batch Normalization (BN) in online continual learning.
BN normalizes the testing data using moments biased towards the current task, resulting in higher catastrophic forgetting.
We propose Continual Normalization (CN) to facilitate training similar to BN while mitigating its negative effect.
arXiv Detail & Related papers (2022-03-30T07:23:24Z) - Diagnosing Batch Normalization in Class Incremental Learning [39.70552266952221]
Batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence.
We propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias.
We show that BN Tricks can bring significant performance gains to all adopted baselines.
arXiv Detail & Related papers (2022-02-16T12:38:43Z) - Rebalancing Batch Normalization for Exemplar-based Class-Incremental
Learning [23.621259845287824]
Batch Normalization (BN) has been extensively studied for neural nets in various computer vision tasks.
We develop a new update patch for BN, particularly tailored for the exemplar-based class-incremental learning (CIL)
arXiv Detail & Related papers (2022-01-29T11:03:03Z) - Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference.
We present a general formulation $alpha$-BN to calibrate the batch statistics.
We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.