Implicit Data-Driven Regularization in Deep Neural Networks under SGD
- URL: http://arxiv.org/abs/2111.13331v1
- Date: Fri, 26 Nov 2021 06:36:16 GMT
- Title: Implicit Data-Driven Regularization in Deep Neural Networks under SGD
- Authors: Xuran Meng, Jianfeng Yao
- Abstract summary: spectral analysis of large random matrices involved in a trained deep neural network (DNN)
We find that these spectra can be classified into three main types: Marvcenko-Pastur spectrum (MP), Marvcenko-Pastur spectrum with few bleeding outliers (MPB), and Heavy tailed spectrum (HT)
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Much research effort has been devoted to explaining the success of deep
learning. Random Matrix Theory (RMT) provides an emerging way to this end:
spectral analysis of large random matrices involved in a trained deep neural
network (DNN) such as weight matrices or Hessian matrices with respect to the
stochastic gradient descent algorithm. In this paper, we conduct extensive
experiments on weight matrices in different modules, e.g., layers, networks and
data sets, to analyze the evolution of their spectra. We find that these
spectra can be classified into three main types: Mar\v{c}enko-Pastur spectrum
(MP), Mar\v{c}enko-Pastur spectrum with few bleeding outliers (MPB), and Heavy
tailed spectrum (HT). Moreover, these discovered spectra are directly connected
to the degree of regularization in the DNN. We argue that the degree of
regularization depends on the quality of data fed to the DNN, namely
Data-Driven Regularization. These findings are validated in several NNs, using
Gaussian synthetic data and real data sets (MNIST and CIFAR10). Finally, we
propose a spectral criterion and construct an early stopping procedure when the
NN is found highly regularized without test data by using the connection
between the spectra types and the degrees of regularization. Such early stopped
DNNs avoid unnecessary extra training while preserving a much comparable
generalization ability.
Related papers
- Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Specformer: Spectral Graph Neural Networks Meet Transformers [51.644312964537356]
Spectral graph neural networks (GNNs) learn graph representations via spectral-domain graph convolutions.
We introduce Specformer, which effectively encodes the set of all eigenvalues and performs self-attention in the spectral domain.
By stacking multiple Specformer layers, one can build a powerful spectral GNN.
arXiv Detail & Related papers (2023-03-02T07:36:23Z) - Spectral Complexity-scaled Generalization Bound of Complex-valued Neural
Networks [78.64167379726163]
This paper is the first work that proves a generalization bound for the complex-valued neural network.
We conduct experiments by training complex-valued convolutional neural networks on different datasets.
arXiv Detail & Related papers (2021-12-07T03:25:25Z) - Deep learning and high harmonic generation [0.0]
We explore the utility of various deep neural networks (NNs) when applied to high harmonic generation (HHG) scenarios.
First, we train the NNs to predict the time-dependent dipole and spectra of HHG emission from reduced-dimensionality models of di- and triatomic systems.
We then demonstrate that transfer learning can be applied to our networks to expand the range of applicability of the networks.
arXiv Detail & Related papers (2020-12-18T16:13:17Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - The Spectrum of Fisher Information of Deep Networks Achieving Dynamical
Isometry [9.289846887298852]
The Fisher information matrix (FIM) is fundamental to understanding the trainability of deep neural nets (DNNs)
We investigate the spectral distribution of the conditional FIM, which is the FIM given a single sample, by focusing on fully-connected networks.
We find that the parameter space's local metric linearly depends on the depth even under the dynamical isometry.
arXiv Detail & Related papers (2020-06-14T06:32:46Z) - Beyond Random Matrix Theory for Deep Networks [0.7614628596146599]
We investigate whether Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities.
We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions.
We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra.
arXiv Detail & Related papers (2020-06-13T21:00:30Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Spectral Learning on Matrices and Tensors [74.88243719463053]
We show that tensor decomposition can pick up latent effects that are missed by matrix methods.
We also outline computational techniques to design efficient tensor decomposition methods.
arXiv Detail & Related papers (2020-04-16T22:53:00Z) - Blind Source Separation for NMR Spectra with Negative Intensity [0.0]
We benchmark several blind source separation techniques for analysis of NMR spectral datasets containing negative intensity.
FastICA, SIMPLISMA, and NNMF are top-performing techniques.
The accuracy of FastICA and SIMPLISMA degrades quickly if excess (unreal) pure components are predicted.
arXiv Detail & Related papers (2020-02-07T20:57:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.