Sparse Meta Networks for Sequential Adaptation and its Application to
Adaptive Language Modelling
- URL: http://arxiv.org/abs/2009.01803v1
- Date: Thu, 3 Sep 2020 17:06:52 GMT
- Title: Sparse Meta Networks for Sequential Adaptation and its Application to
Adaptive Language Modelling
- Authors: Tsendsuren Munkhdalai
- Abstract summary: We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks.
We augment a deep neural network with a layer-specific fast-weight memory.
We demonstrate strong performance on a variety of sequential adaptation scenarios.
- Score: 7.859988850911321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training a deep neural network requires a large amount of single-task data
and involves a long time-consuming optimization phase. This is not scalable to
complex, realistic environments with new unexpected changes. Humans can perform
fast incremental learning on the fly and memory systems in the brain play a
critical role. We introduce Sparse Meta Networks -- a meta-learning approach to
learn online sequential adaptation algorithms for deep neural networks, by
using deep neural networks. We augment a deep neural network with a
layer-specific fast-weight memory. The fast-weights are generated sparsely at
each time step and accumulated incrementally through time providing a useful
inductive bias for online continual adaptation. We demonstrate strong
performance on a variety of sequential adaptation scenarios, from a simple
online reinforcement learning to a large scale adaptive language modelling.
Related papers
- Peer-to-Peer Learning Dynamics of Wide Neural Networks [10.179711440042123]
We provide an explicit, non-asymptotic characterization of the learning dynamics of wide neural networks trained using popularDGD algorithms.
We validate our analytical results by accurately predicting error and error and for classification tasks.
arXiv Detail & Related papers (2024-09-23T17:57:58Z) - Improving the Trainability of Deep Neural Networks through Layerwise
Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network.
We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z) - Learning to Modulate Random Weights: Neuromodulation-inspired Neural
Networks For Efficient Continual Learning [1.9580473532948401]
We introduce a novel neural network architecture inspired by neuromodulation in biological nervous systems.
We show that this approach has strong learning performance per task despite the very small number of learnable parameters.
arXiv Detail & Related papers (2022-04-08T21:12:13Z) - Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting.
FSNet balances fast adaptation to recent changes and retrieving similar old knowledge.
Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Adaptive Reinforcement Learning through Evolving Self-Modifying Neural
Networks [0.0]
Current methods in Reinforcement Learning (RL) only adjust to new interactions after reflection over a specified time interval.
Recent work addressing this by endowing artificial neural networks with neuromodulated plasticity have been shown to improve performance on simple RL tasks trained using backpropagation.
Here we study the problem of meta-learning in a challenging quadruped domain, where each leg of the quadruped has a chance of becoming unusable.
Results demonstrate that agents evolved using self-modifying plastic networks are more capable of adapting to complex meta-learning learning tasks, even outperforming the same network updated using gradient
arXiv Detail & Related papers (2020-05-22T02:24:44Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z) - Side-Tuning: A Baseline for Network Adaptation via Additive Side
Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network.
In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.