Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs
- URL: http://arxiv.org/abs/2008.09342v2
- Date: Fri, 24 Sep 2021 12:19:16 GMT
- Title: Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs
- Authors: Dingheng Wang and Bijiao Wu and Guangshe Zhao and Man Yao and Hengnu
Chen and Lei Deng and Tianyi Yan and Guoqi Li
- Abstract summary: Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition.
In this paper, we consider compressing RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition.
- Score: 11.01184134911405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) are powerful in the tasks oriented to
sequential data, such as natural language processing and video recognition.
However, since the modern RNNs, including long-short term memory (LSTM) and
gated recurrent unit (GRU) networks, have complex topologies and expensive
space/computation complexity, compressing them becomes a hot and promising
topic in recent years. Among plenty of compression methods, tensor
decomposition, e.g., tensor train (TT), block term (BT), tensor ring (TR) and
hierarchical Tucker (HT), appears to be the most amazing approach since a very
high compression ratio might be obtained. Nevertheless, none of these tensor
decomposition formats can provide both the space and computation efficiency. In
this paper, we consider to compress RNNs based on a novel Kronecker
CANDECOMP/PARAFAC (KCP) decomposition, which is derived from Kronecker tensor
(KT) decomposition, by proposing two fast algorithms of multiplication between
the input and the tensor-decomposed weight. According to our experiments based
on UCF11, Youtube Celebrities Face and UCF50 datasets, it can be verified that
the proposed KCP-RNNs have comparable performance of accuracy with those in
other tensor-decomposed formats, and even 278,219x compression ratio could be
obtained by the low rank KCP. More importantly, KCP-RNNs are efficient in both
space and computation complexity compared with other tensor-decomposed ones
under similar ranks. Besides, we find KCP has the best potential for parallel
computing to accelerate the calculations in neural networks.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores [47.87810316745786]
We propose a compression-based tensor decomposition framework, namely the exascale-tensor, to support exascale tensor decomposition.
Compared to the baselines, the exascale-tensor supports 8,000x larger tensors and a speedup up to 6.95x.
We also apply our method to two real-world applications, including gene analysis and tensor layer neural networks.
arXiv Detail & Related papers (2023-11-22T21:04:59Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Latent Matrices for Tensor Network Decomposition and to Tensor
Completion [8.301418317685906]
We propose a novel higher-order tensor decomposition model that decomposes the tensor into smaller ones and speeds up the computation of the algorithm.
Three optimization algorithms, LMTN-PAM, LMTN-SVD and LMTN-AR, have been developed and applied to the tensor-completion task.
Experimental results show that our LMTN-SVD algorithm is 3-6 times faster than the FCTN-PAM algorithm and only a 1.8 points accuracy drop.
arXiv Detail & Related papers (2022-10-07T08:19:50Z) - How to Train Unstable Looped Tensor Network [21.882898731132443]
A rising problem in the compression of Deep Neural Networks is how to reduce the number of parameters in convolutional kernels.
We propose novel methods to gain the stability of the decomposition results, keep the network robust and attain better approximation.
arXiv Detail & Related papers (2022-03-05T00:17:04Z) - Semi-tensor Product-based TensorDecomposition for Neural Network
Compression [57.95644775091316]
This paper generalizes classical matrix product-based mode product to semi-tensor mode product.
As it permits the connection of two factors with different dimensionality, more flexible and compact tensor decompositions can be obtained.
arXiv Detail & Related papers (2021-09-30T15:18:14Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Hybrid Tensor Decomposition in Neural Network Compression [13.146051056642904]
We introduce the hierarchical Tucker (HT) decomposition method to investigate its capability in neural network compression.
We experimentally discover that the HT format has better performance on compressing weight matrices, while the TT format is more suited for compressing convolutional kernels.
arXiv Detail & Related papers (2020-06-29T11:16:22Z) - Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference.
We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.