Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization
- URL: http://arxiv.org/abs/2503.01922v2
- Date: Wed, 05 Mar 2025 21:57:19 GMT
- Title: Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization
- Authors: Leonid Berlyand, Theo Bourdais, Houman Owhadi, Yitzchak Shmalo,
- Abstract summary: Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification.<n>We propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors.<n>We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models by 30-50% with less than 1% loss in accuracy.
- Score: 0.18641315013048293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) have brought significant advancements in various applications in recent years, such as image recognition, speech recognition, and natural language processing. In particular, Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification. In this work, we propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors, and apply it to ViTs. RMT provides a robust framework to analyze the statistical properties of large matrices, which has been shown to be crucial for understanding and optimizing the performance of DNNs. We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models (trained on ImageNet) by 30-50\% with less than 1\% loss in accuracy. To our knowledge, this represents the state-of-the-art in pruning for these ViT models. Furthermore, we provide a rigorous mathematical underpinning of the above numerical studies, namely we proved a theorem for fully connected DNNs, and other more general DNN structures, describing how the randomness in the weight matrices of a DNN decreases as the weights approach a local or global minimum (during training). We verify this theorem through numerical experiments on fully connected DNNs, providing empirical support for our theoretical findings. Moreover, we prove a theorem that describes how DNN loss decreases as we remove randomness in the weight layers, and show a monotone dependence of the decrease in loss with the amount of randomness that we remove. Our results also provide significant RMT-based insights into the role of regularization during training and pruning.
Related papers
- Enhancing Accuracy in Deep Learning Using Random Matrix Theory [4.00671924018776]
We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs)
Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs.
Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
arXiv Detail & Related papers (2023-10-04T21:17:31Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression [37.69303106863453]
The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs)
In this paper, we introduce a framework for conducting IB analysis of general NNs.
We also perform IB analysis on a close-to-real-scale, which reveals new features of the MI dynamics.
arXiv Detail & Related papers (2023-05-13T21:44:32Z) - Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and
Reducing Overfitting [0.0]
The spectrum of the weight layers of a deep neural network (DNN) can be studied and understood using techniques from random matrix theory (RMT)
In this work, these RMT techniques will be used to determine which and how many singular values should be removed from the weight layers of a DNN during training, via singular value decomposition (SVD)
We show the results on a simple DNN model trained on MNIST.
arXiv Detail & Related papers (2023-03-15T23:19:45Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Extended critical regimes of deep neural networks [0.0]
We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters.
In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers.
We provide a theoretical guide for the design of efficient neural architectures.
arXiv Detail & Related papers (2022-03-24T10:15:50Z) - Tensor-Train Recurrent Neural Networks for Interpretable Multi-Way
Financial Forecasting [24.50116388903113]
Recurrent Neural Networks (RNNs) represent the de facto standard machine learning tool for sequence modelling.
The TT-RNN (TT-RNN) has the ability to deal with the curse of dimensionality, such as through the compression ability inherent to tensors.
We show, through the analysis of TT-factors, that the physical meaning underlying tensor decomposition, enables the TT-RNN model to aid the interpretability of results.
arXiv Detail & Related papers (2021-05-11T12:38:34Z) - Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs.
BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.