Compressing LSTM Networks by Matrix Product Operators
- URL: http://arxiv.org/abs/2012.11943v3
- Date: Thu, 31 Mar 2022 05:54:05 GMT
- Title: Compressing LSTM Networks by Matrix Product Operators
- Authors: Ze-Feng Gao, Xingwei Sun, Lan Gao, Junfeng Li and Zhong-Yi Lu
- Abstract summary: Long Short Term Memory(LSTM) models are the building blocks of many state-of-the-art natural language processing(NLP) and speech enhancement(SE) algorithms.
Here we introduce the MPO decomposition, which describes the local correlation of quantum states in quantum many-body physics.
We propose a matrix product operator(MPO) based neural network architecture to replace the LSTM model.
- Score: 7.395226141345625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long Short Term Memory(LSTM) models are the building blocks of many
state-of-the-art natural language processing(NLP) and speech enhancement(SE)
algorithms. However, there are a large number of parameters in an LSTM model.
This usually consumes a large number of resources to train the LSTM model.
Also, LSTM models suffer from computational inefficiency in the inference
phase. Existing model compression methods (e.g., model pruning) can only
discriminate based on the magnitude of model parameters, ignoring the issue of
importance distribution based on the model information. Here we introduce the
MPO decomposition, which describes the local correlation of quantum states in
quantum many-body physics and is used to represent the large model parameter
matrix in a neural network, which can compress the neural network by truncating
the unimportant information in the weight matrix. In this paper, we propose a
matrix product operator(MPO) based neural network architecture to replace the
LSTM model. The effective representation of neural networks by MPO can
effectively reduce the computational consumption of training LSTM models on the
one hand, and speed up the computation in the inference phase of the model on
the other hand. We compare the MPO-LSTM model-based compression model with the
traditional LSTM model with pruning methods on sequence classification,
sequence prediction, and speech enhancement tasks in our experiments. The
experimental results show that our proposed neural network architecture based
on the MPO approach significantly outperforms the pruning approach.
Related papers
- Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - A critical look at deep neural network for dynamic system modeling [0.0]
This paper questions the capability of (deep) neural networks for the modeling of dynamic systems using input-output data.
For the identification of linear time-invariant (LTI) dynamic systems, two representative neural network models are compared.
For the LTI system, both LSTM and CFNN fail to deliver consistent models even in noise-free cases.
arXiv Detail & Related papers (2023-01-27T09:03:05Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Go Beyond Multiple Instance Neural Networks: Deep-learning Models based
on Local Pattern Aggregation [0.0]
convolutional neural networks (CNNs) have brought breakthroughs in processing clinical electrocardiograms (ECGs) and speaker-independent speech.
In this paper, we propose local pattern aggregation-based deep-learning models to effectively deal with both problems.
The novel network structure, called LPANet, has cropping and aggregation operations embedded into it.
arXiv Detail & Related papers (2022-05-28T13:18:18Z) - An advanced spatio-temporal convolutional recurrent neural network for
storm surge predictions [73.4962254843935]
We study the capability of artificial neural network models to emulate storm surge based on the storm track/size/intensity history.
This study presents a neural network model that can predict storm surge, informed by a database of synthetic storm simulations.
arXiv Detail & Related papers (2022-04-18T23:42:18Z) - A Comparative Study of Detecting Anomalies in Time Series Data Using
LSTM and TCN Models [2.007262412327553]
This paper compares two prominent deep learning modeling techniques.
The Recurrent Neural Network (RNN)-based Long Short-Term Memory (LSTM) and the convolutional Neural Network (CNN)-based Temporal Convolutional Networks (TCN) are compared.
arXiv Detail & Related papers (2021-12-17T02:46:55Z) - Mixed Precision Low-bit Quantization of Neural Network Language Models
for Speech Recognition [67.95996816744251]
State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications.
Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors.
Novel mixed precision neural network LM quantization methods are proposed in this paper.
arXiv Detail & Related papers (2021-11-29T12:24:02Z) - Bidirectional LSTM-CRF Attention-based Model for Chinese Word
Segmentation [2.3991565023534087]
We propose a Bidirectional LSTM-CRF Attention-based Model for Chinese word segmentation.
Our model performs better than the baseline methods modeling by other neural networks.
arXiv Detail & Related papers (2021-05-20T11:46:53Z) - Sentiment Analysis Using Simplified Long Short-term Memory Recurrent
Neural Networks [1.5146765382501612]
We perform sentiment analysis on a GOP Debate Twitter dataset.
To speed up training and reduce the computational cost and time, six different parameter reduced slim versions of the LSTM model are proposed.
arXiv Detail & Related papers (2020-05-08T12:50:10Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.