Optimal Protocols for Continual Learning via Statistical Physics and
Control Theory
- URL: http://arxiv.org/abs/2409.18061v1
- Date: Thu, 26 Sep 2024 17:01:41 GMT
- Title: Optimal Protocols for Continual Learning via Statistical Physics and
Control Theory
- Authors: Francesco Mori, Stefano Sarao Mannelli, Francesca Mignacco
- Abstract summary: Artificial neural networks often struggle with catastrophic forgetting when learning multiple tasks sequentially.
Recent theoretical work has addressed this issue by analysing learning curves in synthetic frameworks under training protocols.
We fill this gap combining exact equations for training dynamics, derived using statistical physics techniques, with optimal control methods.
Our theoretical analysis offers non-trivial yet interpretable strategies for mitigating catastrophic forgetting.
- Score: 7.519872646378836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial neural networks often struggle with catastrophic forgetting when
learning multiple tasks sequentially, as training on new tasks degrades the
performance on previously learned ones. Recent theoretical work has addressed
this issue by analysing learning curves in synthetic frameworks under
predefined training protocols. However, these protocols relied on heuristics
and lacked a solid theoretical foundation assessing their optimality. In this
paper, we fill this gap combining exact equations for training dynamics,
derived using statistical physics techniques, with optimal control methods. We
apply this approach to teacher-student models for continual learning and
multi-task problems, obtaining a theory for task-selection protocols maximising
performance while minimising forgetting. Our theoretical analysis offers
non-trivial yet interpretable strategies for mitigating catastrophic
forgetting, shedding light on how optimal learning protocols can modulate
established effects, such as the influence of task similarity on forgetting.
Finally, we validate our theoretical findings on real-world data.
Related papers
- A Kernel Perspective on Distillation-based Collaborative Learning [8.971234046933349]
We propose a nonparametric collaborative learning algorithm that does not directly share local data or models in statistically heterogeneous environments.
Inspired by our theoretical results, we also propose a practical distillation-based collaborative learning algorithm based on neural network architecture.
arXiv Detail & Related papers (2024-10-23T06:40:13Z) - Rethinking Meta-Learning from a Learning Lens [15.36934812747678]
We focus on the more fundamental learning to learn strategy of meta-learning to explore what causes errors and how to eliminate these errors without changing the environment.
We propose using task relations to calibrate the optimization process of meta-learning and propose a plug-and-play method called Task Relation Learner (TRLearner) to achieve this goal.
arXiv Detail & Related papers (2024-09-13T02:00:16Z) - An Expert's Guide to Training Physics-informed Neural Networks [5.198985210238479]
Physics-informed neural networks (PINNs) have been popularized as a deep learning framework.
PINNs can seamlessly synthesize observational data and partial differential equation (PDE) constraints.
We present a series of best practices that can significantly improve the training efficiency and overall accuracy of PINNs.
arXiv Detail & Related papers (2023-08-16T16:19:25Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Towards Scaling Difference Target Propagation by Learning Backprop
Targets [64.90165892557776]
Difference Target Propagation is a biologically-plausible learning algorithm with close relation with Gauss-Newton (GN) optimization.
We propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored.
We report the best performance ever achieved by DTP on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2022-01-31T18:20:43Z) - Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics.
We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z) - Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory
to Learning Algorithms [91.3755431537592]
We analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression.
We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice.
arXiv Detail & Related papers (2021-01-26T17:11:40Z) - Convergence of End-to-End Training in Deep Unsupervised Contrastive
Learning [3.8073142980733]
Unsupervised contrastive learning has proven to be a powerful method for learning representations from unlabeled data.
This study provides theoretical insights into the practical success of these unsupervised methods.
arXiv Detail & Related papers (2020-02-17T14:35:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.