Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and
Reducing Overfitting
- URL: http://arxiv.org/abs/2303.08986v1
- Date: Wed, 15 Mar 2023 23:19:45 GMT
- Title: Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and
Reducing Overfitting
- Authors: Yitzchak Shmalo, Jonathan Jenkins, Oleksii Krupchytskyi
- Abstract summary: The spectrum of the weight layers of a deep neural network (DNN) can be studied and understood using techniques from random matrix theory (RMT)
In this work, these RMT techniques will be used to determine which and how many singular values should be removed from the weight layers of a DNN during training, via singular value decomposition (SVD)
We show the results on a simple DNN model trained on MNIST.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present some applications of random matrix theory for the
training of deep neural networks. Recently, random matrix theory (RMT) has been
applied to the overfitting problem in deep learning. Specifically, it has been
shown that the spectrum of the weight layers of a deep neural network (DNN) can
be studied and understood using techniques from RMT. In this work, these RMT
techniques will be used to determine which and how many singular values should
be removed from the weight layers of a DNN during training, via singular value
decomposition (SVD), so as to reduce overfitting and increase accuracy. We show
the results on a simple DNN model trained on MNIST. In general, these
techniques may be applied to any fully connected layer of a pretrained DNN to
reduce the number of parameters in the layer while preserving and sometimes
increasing the accuracy of the DNN.
Related papers
- Enhancing Accuracy in Deep Learning Using Random Matrix Theory [4.00671924018776]
We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs)
Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs.
Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
arXiv Detail & Related papers (2023-10-04T21:17:31Z) - Variational Tensor Neural Networks for Deep Learning [0.0]
We propose an integration of tensor networks (TN) into deep neural networks (NNs)
This in turn, results in a scalable tensor neural network (TNN) architecture capable of efficient training over a large parameter space.
We validate the accuracy and efficiency of our method by designing TNN models and providing benchmark results for linear and non-linear regressions, data classification and image recognition on MNIST handwritten digits.
arXiv Detail & Related papers (2022-11-26T20:24:36Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs.
BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference.
We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.