Towards Compact Neural Networks via End-to-End Training: A Bayesian
Tensor Approach with Automatic Rank Determination
- URL: http://arxiv.org/abs/2010.08689v3
- Date: Fri, 1 Oct 2021 18:30:39 GMT
- Title: Towards Compact Neural Networks via End-to-End Training: A Bayesian
Tensor Approach with Automatic Rank Determination
- Authors: Cole Hawkins, Xing Liu, Zheng Zhang
- Abstract summary: It is desirable to directly train a compact neural network from scratch with low memory and low computational cost.
Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks.
This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks.
- Score: 11.173092834726528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While post-training model compression can greatly reduce the inference cost
of a deep neural network, uncompressed training still consumes a huge amount of
hardware resources, run-time and energy. It is highly desirable to directly
train a compact neural network from scratch with low memory and low
computational cost. Low-rank tensor decomposition is one of the most effective
approaches to reduce the memory and computing requirements of large-size neural
networks. However, directly training a low-rank tensorized neural network is a
very challenging task because it is hard to determine a proper tensor rank {\it
a priori}, which controls the model complexity and compression ratio in the
training process. This paper presents a novel end-to-end framework for low-rank
tensorized training of neural networks. We first develop a flexible Bayesian
model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor
train and tensor-train matrix) that compress neural network parameters in
training. This model can automatically determine the tensor ranks inside a
nonlinear forward model, which is beyond the capability of existing Bayesian
tensor methods. We further develop a scalable stochastic variational inference
solver to estimate the posterior density of large-scale problems in training.
Our work provides the first general-purpose rank-adaptive framework for
end-to-end tensorized training. Our numerical results on various neural network
architectures show orders-of-magnitude parameter reduction and little accuracy
loss (or even better accuracy) in the training process. Specifically, on a very
large deep learning recommendation system with over $4.2\times 10^9$ model
parameters, our method can reduce the variables to only $1.6\times 10^5$
automatically in the training process (i.e., by $2.6\times 10^4$ times) while
achieving almost the same accuracy.
Related papers
- A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Tensor Decomposition for Model Reduction in Neural Networks: A Review [13.96938227911258]
Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP)
They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation.
This paper reviews six tensor decomposition methods and illustrates their ability to compress model parameters.
arXiv Detail & Related papers (2023-04-26T13:12:00Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Does Preprocessing Help Training Over-parameterized Neural Networks? [19.64638346701198]
We propose two novel preprocessing ideas to bypass the $Omega(mnd)$ barrier.
Our results provide theoretical insights for a large number of previously established fast training methods.
arXiv Detail & Related papers (2021-10-09T18:16:23Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Tensor-Train Networks for Learning Predictive Modeling of
Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications.
We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation.
An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z) - Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Taylorized Training: Towards Better Approximation of Neural Network
Training at Finite Width [116.69845849754186]
Taylorized training involves training the $k$-th order Taylor expansion of the neural network.
We show that Taylorized training agrees with full neural network training increasingly better as we increase $k$.
We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.
arXiv Detail & Related papers (2020-02-10T18:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.