Parallelizing Legendre Memory Unit Training
- URL: http://arxiv.org/abs/2102.11417v1
- Date: Mon, 22 Feb 2021 23:43:47 GMT
- Title: Parallelizing Legendre Memory Unit Training
- Authors: Narsimha Chilkuri, Chris Eliasmith
- Abstract summary: A new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on several benchmark datasets.
Here we leverage the linear time-invariant (LTI) memory component of the LMU to construct a simplified variant that can be parallelized during training.
We show that this reformulation that aids parallelizing, which can be applied generally to any deep network whose recurrent components are linear, makes training up to 200 times faster.
- Score: 5.076419064097734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit
(LMU) was proposed and shown to achieve state-of-the-art performance on several
benchmark datasets. Here we leverage the linear time-invariant (LTI) memory
component of the LMU to construct a simplified variant that can be parallelized
during training (and yet executed as an RNN during inference), thus overcoming
a well known limitation of training RNNs on GPUs. We show that this
reformulation that aids parallelizing, which can be applied generally to any
deep network whose recurrent components are linear, makes training up to 200
times faster. Second, to validate its utility, we compare its performance
against the original LMU and a variety of published LSTM and transformer
networks on seven benchmarks, ranging from psMNIST to sentiment analysis to
machine translation. We demonstrate that our models exhibit superior
performance on all datasets, often using fewer parameters. For instance, our
LMU sets a new state-of-the-art result on psMNIST, and uses half the parameters
while outperforming DistilBERT and LSTM models on IMDB sentiment analysis.
Related papers
- Were RNNs All We Needed? [53.393497486332]
We revisit traditional recurrent neural networks (RNNs) from over a decade ago.
We show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel.
arXiv Detail & Related papers (2024-10-02T03:06:49Z) - LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre
Memory Units [5.830814457423021]
Transformer models have demonstrated high accuracy in numerous applications but have high complexity and lack sequential processing capability.
We show how architectural modifications to a recurrent model can help push its performance toward Transformer models.
We present a spiking version of this architecture, which introduces the benefit of states within the patch embedding and channel mixer modules.
arXiv Detail & Related papers (2024-01-20T01:10:18Z) - Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language
Models [106.65127123304842]
Branch-Train-Merge (BTM) is an efficient algorithm for parallel training of large language models (LLMs)
BTM learns a set of independent expert LMs (ELMs), each specialized to a different textual domain.
Experiments show that BTM improves in- and out-of-domain perplexities as compared to GPT-style Transformer LMs.
arXiv Detail & Related papers (2022-08-05T17:46:38Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Automatic Remaining Useful Life Estimation Framework with Embedded
Convolutional LSTM as the Backbone [5.927250637620123]
We propose a new LSTM variant called embedded convolutional LSTM (E NeuralTM)
In ETM a group of different 1D convolutions is embedded into the LSTM structure. Through this, the temporal information is preserved between and within windows.
We show the superiority of our proposed ETM approach over the state-of-the-art approaches on several widely used benchmark data sets for RUL Estimation.
arXiv Detail & Related papers (2020-08-10T08:34:20Z) - Achieving Online Regression Performance of LSTMs with Simple RNNs [0.0]
We introduce a first-order training algorithm with a linear time complexity in the number of parameters.
We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time.
arXiv Detail & Related papers (2020-05-16T11:41:13Z) - Sentiment Analysis Using Simplified Long Short-term Memory Recurrent
Neural Networks [1.5146765382501612]
We perform sentiment analysis on a GOP Debate Twitter dataset.
To speed up training and reduce the computational cost and time, six different parameter reduced slim versions of the LSTM model are proposed.
arXiv Detail & Related papers (2020-05-08T12:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.