Related papers: Routing Networks with Co-training for Continual Learning

Routing Networks with Co-training for Continual Learning

URL: http://arxiv.org/abs/2009.04381v1
Date: Wed, 9 Sep 2020 15:58:51 GMT
Title: Routing Networks with Co-training for Continual Learning
Authors: Mark Collier, Efi Kokiopoulou, Andrea Gesmundo, Jesse Berent
Abstract summary: We propose the use of sparse routing networks for continual learning. For each input, these network architectures activate a different path through a network of experts. In practice, we find it is necessary to develop a new training method for routing networks, which we call co-training.
Score: 5.957609459173546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The core challenge with continual learning is catastrophic forgetting, the phenomenon that when neural networks are trained on a sequence of tasks they rapidly forget previously learned tasks. It has been observed that catastrophic forgetting is most severe when tasks are dissimilar to each other. We propose the use of sparse routing networks for continual learning. For each input, these network architectures activate a different path through a network of experts. Routing networks have been shown to learn to route similar tasks to overlapping sets of experts and dissimilar tasks to disjoint sets of experts. In the continual learning context this behaviour is desirable as it minimizes interference between dissimilar tasks while allowing positive transfer between related tasks. In practice, we find it is necessary to develop a new training method for routing networks, which we call co-training which avoids poorly initialized experts when new tasks are presented. When combined with a small episodic memory replay buffer, sparse routing networks with co-training outperform densely connected networks on the MNIST-Permutations and MNIST-Rotations benchmarks.

Related papers

Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting [13.412573082645096]
We present a method, Eidetic Learning, that provably solves catastrophic forgetting. A network trained with Eidetic Learning -- here, an EideticNet -- requires no rehearsal or replay. An EideticNet is easy to implement and train, is efficient, and has time and space complexity linear in the number of parameters.
arXiv Detail & Related papers (2025-02-13T17:10:43Z)
Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them [0.0]
Traditional approaches to neuroevolution often start from scratch. Recombining trained networks is non-trivial because architectures and feature representations typically differ. We employ stitching, which merges the networks by introducing new layers at crossover points.
arXiv Detail & Related papers (2024-03-21T08:30:44Z)
Negotiated Representations to Prevent Forgetting in Machine Learning Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning. We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
Modular Approach to Machine Reading Comprehension: Mixture of Task-Aware Experts [0.5801044612920815]
We present a Mixture of Task-Aware Experts Network for Machine Reading on a relatively small dataset. We focus on the issue of common-sense learning, enforcing the common ground knowledge. We take inspi ration on the recent advancements of multitask and transfer learning.
arXiv Detail & Related papers (2022-10-04T17:13:41Z)
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning. We devise task-aware gating functions to route examples from different tasks to specialized experts. This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z)
Thinking Deeply with Recurrence: Generalizing from Easy to Hard Sequential Reasoning Problems [51.132938969015825]
We observe that recurrent networks have the uncanny ability to closely emulate the behavior of non-recurrent deep models. We show that recurrent networks that are trained to solve simple mazes with few recurrent steps can indeed solve much more complex problems simply by performing additional recurrences during inference.
arXiv Detail & Related papers (2021-02-22T14:09:20Z)
Beneficial Perturbation Network for designing general adaptive artificial intelligence systems [14.226973149346886]
We propose a new type of deep neural network with extra, out-of-network, task-dependent biasing units to accommodate dynamic situations. Our approach is memory-efficient and parameter-efficient, can accommodate many tasks, and achieves state-of-the-art performance across different tasks and domains.
arXiv Detail & Related papers (2020-09-27T01:28:10Z)
Auxiliary Learning by Implicit Differentiation [54.92146615836611]
Training neural networks with auxiliary tasks is a common practice for improving the performance on a main task of interest. Here, we propose a novel framework, AuxiLearn, that targets both challenges based on implicit differentiation. First, when useful auxiliaries are known, we propose learning a network that combines all losses into a single coherent objective function. Second, when no useful auxiliary task is known, we describe how to learn a network that generates a meaningful, novel auxiliary task.
arXiv Detail & Related papers (2020-06-22T19:35:07Z)
Learning to Branch for Multi-Task Learning [12.49373126819798]
We present an automated multi-task learning algorithm that learns where to share or branch within a network. We propose a novel tree-structured design space that casts a tree branching operation as a gumbel-softmax sampling procedure.
arXiv Detail & Related papers (2020-06-02T19:23:21Z)
Semantic Drift Compensation for Class-Incremental Learning [48.749630494026086]
Class-incremental learning of deep networks sequentially increases the number of classes to be classified. We propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars.
arXiv Detail & Related papers (2020-04-01T13:31:19Z)
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network. In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.