Decomposing a Recurrent Neural Network into Modules for Enabling
Reusability and Replacement
- URL: http://arxiv.org/abs/2212.05970v1
- Date: Fri, 9 Dec 2022 03:29:38 GMT
- Title: Decomposing a Recurrent Neural Network into Modules for Enabling
Reusability and Replacement
- Authors: Sayem Mohammad Imtiaz, Fraol Batole, Astha Singh, Rangeet Pan, Breno
Dantas Cruz, Hridesh Rajan
- Abstract summary: We propose the first approach to decompose an RNN into modules.
We study different types of RNNs, i.e., Vanilla, LSTM, and GRU.
We show how such RNN modules can be reused and replaced in various scenarios.
- Score: 11.591247347259317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Can we take a recurrent neural network (RNN) trained to translate between
languages and augment it to support a new natural language without retraining
the model from scratch? Can we fix the faulty behavior of the RNN by replacing
portions associated with the faulty behavior? Recent works on decomposing a
fully connected neural network (FCNN) and convolutional neural network (CNN)
into modules have shown the value of engineering deep models in this manner,
which is standard in traditional SE but foreign for deep learning models.
However, prior works focus on the image-based multiclass classification
problems and cannot be applied to RNN due to (a) different layer structures,
(b) loop structures, (c) different types of input-output architectures, and (d)
usage of both nonlinear and logistic activation functions. In this work, we
propose the first approach to decompose an RNN into modules. We study different
types of RNNs, i.e., Vanilla, LSTM, and GRU. Further, we show how such RNN
modules can be reused and replaced in various scenarios. We evaluate our
approach against 5 canonical datasets (i.e., Math QA, Brown Corpus,
Wiki-toxicity, Clinc OOS, and Tatoeba) and 4 model variants for each dataset.
We found that decomposing a trained model has a small cost (Accuracy: -0.6%,
BLEU score: +0.10%). Also, the decomposed modules can be reused and replaced
without needing to retrain.
Related papers
- Learning Useful Representations of Recurrent Neural Network Weight Matrices [30.583752432727326]
Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers.
How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks?
We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs.
Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs.
arXiv Detail & Related papers (2024-03-18T17:32:23Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z) - Adaptive-saturated RNN: Remember more with less instability [2.191505742658975]
This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two approaches.
Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors.
arXiv Detail & Related papers (2023-04-24T02:28:03Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - Decomposing Convolutional Neural Networks into Reusable and Replaceable
Modules [15.729284470106826]
We propose to decompose a CNN model used for image classification problems into modules for each output class.
These modules can further be reused or replaced to build a new model.
We have evaluated our approach with CIFAR-10, CIFAR-100, and ImageNet tiny datasets with three variations of ResNet models.
arXiv Detail & Related papers (2021-10-11T20:41:50Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Fully Spiking Variational Autoencoder [66.58310094608002]
Spiking neural networks (SNNs) can be run on neuromorphic devices with ultra-high speed and ultra-low energy consumption.
In this study, we build a variational autoencoder (VAE) with SNN to enable image generation.
arXiv Detail & Related papers (2021-09-26T06:10:14Z) - Learning Hierarchical Structures with Differentiable Nondeterministic
Stacks [25.064819128982556]
We present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN)
We show that the NS-RNN achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks.
We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language.
arXiv Detail & Related papers (2021-09-05T03:25:23Z) - Introducing the Hidden Neural Markov Chain framework [7.85426761612795]
This paper proposes the original Hidden Neural Markov Chain (HNMC) framework, a new family of sequential neural models.
We propose three different models: the classic HNMC, the HNMC2, and the HNMC-CN.
It shows this new neural sequential framework's potential, which can open the way to new models and might eventually compete with the prevalent BiLSTM and BiGRU.
arXiv Detail & Related papers (2021-02-17T20:13:45Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.