Spectral Complexity-scaled Generalization Bound of Complex-valued Neural
Networks
- URL: http://arxiv.org/abs/2112.03467v1
- Date: Tue, 7 Dec 2021 03:25:25 GMT
- Title: Spectral Complexity-scaled Generalization Bound of Complex-valued Neural
Networks
- Authors: Haowen Chen, Fengxiang He, Shiye Lei and Dacheng Tao
- Abstract summary: This paper is the first work that proves a generalization bound for the complex-valued neural network.
We conduct experiments by training complex-valued convolutional neural networks on different datasets.
- Score: 78.64167379726163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex-valued neural networks (CVNNs) have been widely applied to various
fields, especially signal processing and image recognition. However, few works
focus on the generalization of CVNNs, albeit it is vital to ensure the
performance of CVNNs on unseen data. This paper is the first work that proves a
generalization bound for the complex-valued neural network. The bound scales
with the spectral complexity, the dominant factor of which is the spectral norm
product of weight matrices. Further, our work provides a generalization bound
for CVNNs when training data is sequential, which is also affected by the
spectral complexity. Theoretically, these bounds are derived via Maurey
Sparsification Lemma and Dudley Entropy Integral. Empirically, we conduct
experiments by training complex-valued convolutional neural networks on
different datasets: MNIST, FashionMNIST, CIFAR-10, CIFAR-100, Tiny ImageNet,
and IMDB. Spearman's rank-order correlation coefficients and the corresponding
p values on these datasets give strong proof that the spectral complexity of
the network, measured by the weight matrices spectral norm product, has a
statistically significant correlation with the generalization ability.
Related papers
- Spectral complexity of deep neural networks [2.099922236065961]
We use the angular power spectrum of the limiting field to characterize the complexity of the network architecture.
On this basis, we classify neural networks as low-disorder, sparse, or high-disorder.
We show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks.
arXiv Detail & Related papers (2024-05-15T17:55:05Z) - On Generalization Bounds for Deep Compound Gaussian Neural Networks [1.4425878137951238]
Unrolled deep neural networks (DNNs) provide better interpretability and superior empirical performance than standard DNNs.
We develop novel generalization error bounds for a class of unrolled DNNs informed by a compound Gaussian prior.
Under realistic conditions, we show that, at worst, the generalization error scales $mathcalO(nsqrt(n))$ in the signal dimension and $mathcalO(($Network Size$)3/2)$ in network size.
arXiv Detail & Related papers (2024-02-20T16:01:39Z) - Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Implicit Data-Driven Regularization in Deep Neural Networks under SGD [0.0]
spectral analysis of large random matrices involved in a trained deep neural network (DNN)
We find that these spectra can be classified into three main types: Marvcenko-Pastur spectrum (MP), Marvcenko-Pastur spectrum with few bleeding outliers (MPB), and Heavy tailed spectrum (HT)
arXiv Detail & Related papers (2021-11-26T06:36:16Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural
Networks [17.188280334580195]
We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples.
Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Kernel Tangent (NTK)
We verify our theory with simulations on synthetic data and MNIST dataset.
arXiv Detail & Related papers (2020-02-07T00:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.