Related papers: Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

URL: http://arxiv.org/abs/2009.01803v1
Date: Thu, 3 Sep 2020 17:06:52 GMT
Title: Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling
Authors: Tsendsuren Munkhdalai
Abstract summary: We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks. We augment a deep neural network with a layer-specific fast-weight memory. We demonstrate strong performance on a variety of sequential adaptation scenarios.
Score: 7.859988850911321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training a deep neural network requires a large amount of single-task data and involves a long time-consuming optimization phase. This is not scalable to complex, realistic environments with new unexpected changes. Humans can perform fast incremental learning on the fly and memory systems in the brain play a critical role. We introduce Sparse Meta Networks -- a meta-learning approach to learn online sequential adaptation algorithms for deep neural networks, by using deep neural networks. We augment a deep neural network with a layer-specific fast-weight memory. The fast-weights are generated sparsely at each time step and accumulated incrementally through time providing a useful inductive bias for online continual adaptation. We demonstrate strong performance on a variety of sequential adaptation scenarios, from a simple online reinforcement learning to a large scale adaptive language modelling.

Related papers

Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation [7.079039376205091]
We study the supervised learning of a multi-layer perceptron.<n>We focus on the challenging regime where the number of trainable parameters and data are comparable.<n>Despite its simplicity, the Bayesian-optimal setting provides insights on how the depth, non-linearity and finite (proportional) width influence neural networks.
arXiv Detail & Related papers (2025-10-28T16:44:34Z)
Connectome-Guided Automatic Learning Rates for Deep Networks [3.2228025627337864]
We propose Connectome-Guided Automatic Learning Rate (CG-ALR) that adjusts learning rates online as this connectome reconfigures.<n>This connectomics-inspired mechanism adapts step sizes to the network's dynamic functional organization, slowing learning during unstable reconfiguration and accelerating it when stable organization emerges.<n>Our results demonstrate that principles inspired by brain connectomes can inform the design of adaptive learning rates in deep learning, generally outperforming traditional SGD-based schedules and recent methods.
arXiv Detail & Related papers (2025-10-27T19:11:49Z)
Improving Neural Network Training using Dynamic Learning Rate Schedule for PINNs and Image Classification [0.0]
This paper presents a dynamic learning rate scheduler (DLRS) algorithm that adapts the learning rate based on the loss values calculated during the training process.<n> Experiments are conducted on problems related to physics-informed neural networks (PINNs) and image classification using multilayer perceptrons and convolutional neural networks, respectively.
arXiv Detail & Related papers (2025-07-29T12:31:21Z)
Peer-to-Peer Learning Dynamics of Wide Neural Networks [10.179711440042123]
We provide an explicit, non-asymptotic characterization of the learning dynamics of wide neural networks trained using popularDGD algorithms. We validate our analytical results by accurately predicting error and error and for classification tasks.
arXiv Detail & Related papers (2024-09-23T17:57:58Z)
Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network. We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z)
Learning to Modulate Random Weights: Neuromodulation-inspired Neural Networks For Efficient Continual Learning [1.9580473532948401]
We introduce a novel neural network architecture inspired by neuromodulation in biological nervous systems. We show that this approach has strong learning performance per task despite the very small number of learnable parameters.
arXiv Detail & Related papers (2022-04-08T21:12:13Z)
Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting. FSNet balances fast adaptation to recent changes and retrieving similar old knowledge. Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z)
Dynamic Neural Diversification: Path to Computationally Sustainable Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks. We explore the diversity of the neurons within the hidden layer during the learning process. We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
Adaptive Reinforcement Learning through Evolving Self-Modifying Neural Networks [0.0]
Current methods in Reinforcement Learning (RL) only adjust to new interactions after reflection over a specified time interval. Recent work addressing this by endowing artificial neural networks with neuromodulated plasticity have been shown to improve performance on simple RL tasks trained using backpropagation. Here we study the problem of meta-learning in a challenging quadruped domain, where each leg of the quadruped has a chance of becoming unusable. Results demonstrate that agents evolved using self-modifying plastic networks are more capable of adapting to complex meta-learning learning tasks, even outperforming the same network updated using gradient
arXiv Detail & Related papers (2020-05-22T02:24:44Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network. In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.