Distal Interference: Exploring the Limits of Model-Based Continual
Learning
- URL: http://arxiv.org/abs/2402.08255v1
- Date: Tue, 13 Feb 2024 07:07:37 GMT
- Title: Distal Interference: Exploring the Limits of Model-Based Continual
Learning
- Authors: Heinrich van Deventer, Anna Sergeevna Bosman
- Abstract summary: Continual learning is hindered by catastrophic interference or forgetting.
Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference.
It is conjectured that continual learning with complexity models requires augmentation of the training data or algorithm.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning is the sequential learning of different tasks by a machine
learning model. Continual learning is known to be hindered by catastrophic
interference or forgetting, i.e. rapid unlearning of earlier learned tasks when
new tasks are learned. Despite their practical success, artificial neural
networks (ANNs) are prone to catastrophic interference. This study analyses how
gradient descent and overlapping representations between distant input points
lead to distal interference and catastrophic interference. Distal interference
refers to the phenomenon where training a model on a subset of the domain leads
to non-local changes on other subsets of the domain. This study shows that
uniformly trainable models without distal interference must be exponentially
large. A novel antisymmetric bounded exponential layer B-spline ANN
architecture named ABEL-Spline is proposed that can approximate any continuous
function, is uniformly trainable, has polynomial computational complexity, and
provides some guarantees for distal interference. Experiments are presented to
demonstrate the theoretical properties of ABEL-Splines. ABEL-Splines are also
evaluated on benchmark regression problems. It is concluded that the weaker
distal interference guarantees in ABEL-Splines are insufficient for model-only
continual learning. It is conjectured that continual learning with polynomial
complexity models requires augmentation of the training data or algorithm.
Related papers
- Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Learning in Hybrid Active Inference Models [0.8749675983608172]
We present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller.
We make use of recent work in recurrent switching linear dynamical systems which implement end-to-end learning of meaningful discrete representations.
We apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and successful planning.
arXiv Detail & Related papers (2024-09-02T08:41:45Z) - Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning [38.09011520275557]
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones.
We propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL.
arXiv Detail & Related papers (2024-06-04T15:47:03Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset.
Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model.
However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail.
We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - Neural Network Approximation for Pessimistic Offline Reinforcement
Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Neural Abstractions [72.42530499990028]
We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics.
We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models.
arXiv Detail & Related papers (2023-01-27T12:38:09Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Enhancement of shock-capturing methods via machine learning [0.0]
We develop an improved finite-volume method for simulating PDEs with discontinuous solutions.
We train a neural network to improve the results of a fifth-order WENO method.
We find that our method outperforms WENO in simulations where the numerical solution becomes overly diffused.
arXiv Detail & Related papers (2020-02-06T21:51:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.