A Fully Tensorized Recurrent Neural Network
- URL: http://arxiv.org/abs/2010.04196v3
- Date: Wed, 10 Nov 2021 18:54:36 GMT
- Title: A Fully Tensorized Recurrent Neural Network
- Authors: Charles C. Onu, Jacob E. Miller, Doina Precup
- Abstract summary: We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
- Score: 48.50376453324581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) are powerful tools for sequential modeling,
but typically require significant overparameterization and regularization to
achieve optimal performance. This leads to difficulties in the deployment of
large RNNs in resource-limited settings, while also introducing complications
in hyperparameter selection and training. To address these issues, we introduce
a "fully tensorized" RNN architecture which jointly encodes the separate weight
matrices within each recurrent cell using a lightweight tensor-train (TT)
factorization. This approach represents a novel form of weight sharing which
reduces model size by several orders of magnitude, while still maintaining
similar or better performance compared to standard RNNs. Experiments on image
classification and speaker verification tasks demonstrate further benefits for
reducing inference times and stabilizing model training and hyperparameter
selection.
Related papers
- Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Variational Tensor Neural Networks for Deep Learning [0.0]
We propose an integration of tensor networks (TN) into deep neural networks (NNs)
This in turn, results in a scalable tensor neural network (TNN) architecture capable of efficient training over a large parameter space.
We validate the accuracy and efficiency of our method by designing TNN models and providing benchmark results for linear and non-linear regressions, data classification and image recognition on MNIST handwritten digits.
arXiv Detail & Related papers (2022-11-26T20:24:36Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - STN: Scalable Tensorizing Networks via Structure-Aware Training and
Adaptive Compression [10.067082377396586]
We propose Scalableizing Networks (STN), which adaptively adjust the model size and decomposition structure without retraining.
STN is compatible with arbitrary network architectures and achieves higher compression performance and flexibility over other tensorizing versions.
arXiv Detail & Related papers (2022-05-30T15:50:48Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Reverse engineering recurrent neural networks with Jacobian switching
linear dynamical systems [24.0378100479104]
Recurrent neural networks (RNNs) are powerful models for processing time-series data.
The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges.
We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation.
arXiv Detail & Related papers (2021-11-01T20:49:30Z) - Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs.
BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z) - Recurrent Graph Tensor Networks: A Low-Complexity Framework for
Modelling High-Dimensional Multi-Way Sequence [24.594587557319837]
We develop a graph filter framework for approximating the modelling of hidden states in Recurrent Neural Networks (RNNs)
The proposed framework is validated through several multi-way sequence modelling tasks and benchmarked against traditional RNNs.
We show that the proposed RGTN is capable of not only out-performing standard RNNs, but also mitigating the Curse of Dimensionality associated with traditional RNNs.
arXiv Detail & Related papers (2020-09-18T10:13:36Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.