Related papers: Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting

Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting

URL: http://arxiv.org/abs/2303.08986v1
Date: Wed, 15 Mar 2023 23:19:45 GMT
Title: Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and Reducing Overfitting
Authors: Yitzchak Shmalo, Jonathan Jenkins, Oleksii Krupchytskyi
Abstract summary: The spectrum of the weight layers of a deep neural network (DNN) can be studied and understood using techniques from random matrix theory (RMT) In this work, these RMT techniques will be used to determine which and how many singular values should be removed from the weight layers of a DNN during training, via singular value decomposition (SVD) We show the results on a simple DNN model trained on MNIST.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we present some applications of random matrix theory for the training of deep neural networks. Recently, random matrix theory (RMT) has been applied to the overfitting problem in deep learning. Specifically, it has been shown that the spectrum of the weight layers of a deep neural network (DNN) can be studied and understood using techniques from RMT. In this work, these RMT techniques will be used to determine which and how many singular values should be removed from the weight layers of a DNN during training, via singular value decomposition (SVD), so as to reduce overfitting and increase accuracy. We show the results on a simple DNN model trained on MNIST. In general, these techniques may be applied to any fully connected layer of a pretrained DNN to reduce the number of parameters in the layer while preserving and sometimes increasing the accuracy of the DNN.

Related papers

Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization [0.18641315013048293]
Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification. We propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors. We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models by 30-50% with less than 1% loss in accuracy.
arXiv Detail & Related papers (2025-03-02T05:25:20Z)
Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval. A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z)
Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Enhancing Accuracy in Deep Learning Using Random Matrix Theory [4.00671924018776]
We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs) Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
arXiv Detail & Related papers (2023-10-04T21:17:31Z)
Variational Tensor Neural Networks for Deep Learning [0.0]
We propose an integration of tensor networks (TN) into deep neural networks (NNs) This in turn, results in a scalable tensor neural network (TNN) architecture capable of efficient training over a large parameter space. We validate the accuracy and efficiency of our method by designing TNN models and providing benchmark results for linear and non-linear regressions, data classification and image recognition on MNIST handwritten digits.
arXiv Detail & Related papers (2022-11-26T20:24:36Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z)
Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance. We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z)
Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs. BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z)
A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell. This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z)
Tensor train decompositions on recurrent networks [60.334946204107446]
Matrix product state (MPS) tensor trains have more attractive features than MPOs, in terms of storage reduction and computing time at inference. We show that MPS tensor trains should be at the forefront of LSTM network compression through a theoretical analysis and practical experiments on NLP task.
arXiv Detail & Related papers (2020-06-09T18:25:39Z)
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.