Enhancing Accuracy in Deep Learning Using Random Matrix Theory
- URL: http://arxiv.org/abs/2310.03165v3
- Date: Mon, 9 Sep 2024 16:40:24 GMT
- Title: Enhancing Accuracy in Deep Learning Using Random Matrix Theory
- Authors: Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo, Lei Zhang,
- Abstract summary: We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs)
Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs.
Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
- Score: 4.00671924018776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning that is reducing the number of DNN parameters (weights). Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs. Moreover, pruning the fully connected DNNs actually increases the accuracy and decreases the variance for random initializations. Our numerics indicate that this enhancement in accuracy is due to the simplification of the loss landscape. We next provide rigorous mathematical underpinning of these numerical results by proving the RMT-based Pruning Theorem. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
Related papers
- Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.
By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.
Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and
Reducing Overfitting [0.0]
The spectrum of the weight layers of a deep neural network (DNN) can be studied and understood using techniques from random matrix theory (RMT)
In this work, these RMT techniques will be used to determine which and how many singular values should be removed from the weight layers of a DNN during training, via singular value decomposition (SVD)
We show the results on a simple DNN model trained on MNIST.
arXiv Detail & Related papers (2023-03-15T23:19:45Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Spectral Pruning for Recurrent Neural Networks [0.0]
Pruning techniques for neural networks with a recurrent architecture, such as the recurrent neural network (RNN), are strongly desired for their application to edge-computing devices.
In this paper, we propose an appropriate pruning algorithm for RNNs inspired by "spectral pruning", and provide the generalization error bounds for compressed RNNs.
arXiv Detail & Related papers (2021-05-23T00:30:59Z) - Advantage of Deep Neural Networks for Estimating Functions with
Singularity on Hypersurfaces [23.21591478556582]
We develop a minimax rate analysis to describe the reason that deep neural networks (DNNs) perform better than other standard methods.
This study tries to fill this gap by considering the estimation for a class of non-smooth functions that have singularities on hypersurfaces.
arXiv Detail & Related papers (2020-11-04T12:51:14Z) - Interval Neural Networks: Uncertainty Scores [11.74565957328407]
We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs)
This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic.
In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error.
arXiv Detail & Related papers (2020-03-25T18:03:51Z) - Interpretable Deep Recurrent Neural Networks via Unfolding Reweighted
$\ell_1$-$\ell_1$ Minimization: Architecture Design and Generalization
Analysis [19.706363403596196]
This paper develops a novel deep recurrent neural network (coined reweighted-RNN) by the unfolding of a reweighted minimization algorithm.
To the best of our knowledge, this is the first deep unfolding method that explores reweighted minimization.
The experimental results on the moving MNIST dataset demonstrate that the proposed deep reweighted-RNN significantly outperforms existing RNN models.
arXiv Detail & Related papers (2020-03-18T17:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.