How to Train Unstable Looped Tensor Network
- URL: http://arxiv.org/abs/2203.02617v1
- Date: Sat, 5 Mar 2022 00:17:04 GMT
- Title: How to Train Unstable Looped Tensor Network
- Authors: Anh-Huy Phan, Konstantin Sobolev, Dmitry Ermilov, Igor Vorona, Nikolay
Kozyrskiy, Petr Tichavsky and Andrzej Cichocki
- Abstract summary: A rising problem in the compression of Deep Neural Networks is how to reduce the number of parameters in convolutional kernels.
We propose novel methods to gain the stability of the decomposition results, keep the network robust and attain better approximation.
- Score: 21.882898731132443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A rising problem in the compression of Deep Neural Networks is how to reduce
the number of parameters in convolutional kernels and the complexity of these
layers by low-rank tensor approximation. Canonical polyadic tensor
decomposition (CPD) and Tucker tensor decomposition (TKD) are two solutions to
this problem and provide promising results. However, CPD often fails due to
degeneracy, making the networks unstable and hard to fine-tune. TKD does not
provide much compression if the core tensor is big. This motivates using a
hybrid model of CPD and TKD, a decomposition with multiple Tucker models with
small core tensor, known as block term decomposition (BTD). This paper proposes
a more compact model that further compresses the BTD by enforcing core tensors
in BTD identical. We establish a link between the BTD with shared parameters
and a looped chain tensor network (TC). Unfortunately, such strongly
constrained tensor networks (with loop) encounter severe numerical instability,
as proved by y (Landsberg, 2012) and (Handschuh, 2015a). We study perturbation
of chain tensor networks, provide interpretation of instability in TC,
demonstrate the problem. We propose novel methods to gain the stability of the
decomposition results, keep the network robust and attain better approximation.
Experimental results will confirm the superiority of the proposed methods in
compression of well-known CNNs, and TC decomposition under challenging
scenarios
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Error Analysis of Tensor-Train Cross Approximation [88.83467216606778]
We provide accuracy guarantees in terms of the entire tensor for both exact and noisy measurements.
Results are verified by numerical experiments, and may have important implications for the usefulness of cross approximations for high-order tensors.
arXiv Detail & Related papers (2022-07-09T19:33:59Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - Tensor-Train Networks for Learning Predictive Modeling of
Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications.
We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation.
An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs [11.01184134911405]
Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition.
In this paper, we consider compressing RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition.
arXiv Detail & Related papers (2020-08-21T07:29:45Z) - Stable Low-rank Tensor Decomposition for Compression of Convolutional
Neural Network [19.717842489217684]
This paper is the first study on degeneracy in the tensor decomposition of convolutional kernels.
We present a novel method, which can stabilize the low-rank approximation of convolutional kernels and ensure efficient compression.
We evaluate our approach on popular CNN architectures for image classification and show that our method results in much lower accuracy degradation and provides consistent performance.
arXiv Detail & Related papers (2020-08-12T17:10:12Z) - T-Basis: a Compact Representation for Neural Networks [89.86997385827055]
We introduce T-Basis, a concept for a compact representation of a set of tensors, each of an arbitrary shape, which is often seen in Neural Networks.
We evaluate the proposed approach on the task of neural network compression and demonstrate that it reaches high compression rates at acceptable performance drops.
arXiv Detail & Related papers (2020-07-13T19:03:22Z) - Hybrid Tensor Decomposition in Neural Network Compression [13.146051056642904]
We introduce the hierarchical Tucker (HT) decomposition method to investigate its capability in neural network compression.
We experimentally discover that the HT format has better performance on compressing weight matrices, while the TT format is more suited for compressing convolutional kernels.
arXiv Detail & Related papers (2020-06-29T11:16:22Z) - On Recoverability of Randomly Compressed Tensors with Low CP Rank [29.00634848772122]
We show that if the number of measurements is on the same order of magnitude as that of the model parameters, then the tensor is recoverable.
Our proof is based on deriving a textitrestricted isometry property (R.I.P.) under the CPD model via set covering techniques.
arXiv Detail & Related papers (2020-01-08T04:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.