On Generalization Bounds for Deep Compound Gaussian Neural Networks
- URL: http://arxiv.org/abs/2402.13106v1
- Date: Tue, 20 Feb 2024 16:01:39 GMT
- Title: On Generalization Bounds for Deep Compound Gaussian Neural Networks
- Authors: Carter Lyons, Raghu G. Raj, Margaret Cheney
- Abstract summary: Unrolled deep neural networks (DNNs) provide better interpretability and superior empirical performance than standard DNNs.
We develop novel generalization error bounds for a class of unrolled DNNs informed by a compound Gaussian prior.
Under realistic conditions, we show that, at worst, the generalization error scales $mathcalO(nsqrt(n))$ in the signal dimension and $mathcalO(($Network Size$)3/2)$ in network size.
- Score: 1.4425878137951238
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Algorithm unfolding or unrolling is the technique of constructing a deep
neural network (DNN) from an iterative algorithm. Unrolled DNNs often provide
better interpretability and superior empirical performance over standard DNNs
in signal estimation tasks. An important theoretical question, which has only
recently received attention, is the development of generalization error bounds
for unrolled DNNs. These bounds deliver theoretical and practical insights into
the performance of a DNN on empirical datasets that are distinct from, but
sampled from, the probability density generating the DNN training data. In this
paper, we develop novel generalization error bounds for a class of unrolled
DNNs that are informed by a compound Gaussian prior. These compound Gaussian
networks have been shown to outperform comparative standard and unfolded deep
neural networks in compressive sensing and tomographic imaging problems. The
generalization error bound is formulated by bounding the Rademacher complexity
of the class of compound Gaussian network estimates with Dudley's integral.
Under realistic conditions, we show that, at worst, the generalization error
scales $\mathcal{O}(n\sqrt{\ln(n)})$ in the signal dimension and
$\mathcal{O}(($Network Size$)^{3/2})$ in network size.
Related papers
- Generalization of Geometric Graph Neural Networks [84.01980526069075]
We study the generalization capabilities of geometric graph neural networks (GNNs)
We prove a generalization gap between the optimal empirical risk and the optimal statistical risk of this GNN.
The most important observation is that the generalization capability can be realized with one large graph instead of being limited to the size of the graph as in previous results.
arXiv Detail & Related papers (2024-09-08T18:55:57Z) - Generalization of Graph Neural Networks is Robust to Model Mismatch [84.01980526069075]
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities.
In this paper, we examine GNNs that operate on geometric graphs generated from manifold models.
Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch.
arXiv Detail & Related papers (2024-08-25T16:00:44Z) - Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Analysis of Convolutions, Non-linearity and Depth in Graph Neural
Networks using Neural Tangent Kernel [8.824340350342512]
Graph Neural Networks (GNNs) are designed to exploit the structural information of the data by aggregating the neighboring nodes.
We theoretically analyze the influence of different aspects of the GNN architecture using the Graph Neural Kernel in a semi-supervised node classification setting.
We prove that: (i) linear networks capture the class information as good as ReLU networks; (ii) row normalization preserves the underlying class structure better than other convolutions; (iii) performance degrades with network depth due to over-smoothing; (iv) skip connections retain the class information even at infinite depth, thereby eliminating over-smooth
arXiv Detail & Related papers (2022-10-18T12:28:37Z) - Masked Bayesian Neural Networks : Computation and Optimality [1.3649494534428745]
We propose a novel sparse Bayesian neural network (BNN) which searches a good deep neural network with an appropriate complexity.
We employ the masking variables at each node which can turn off some nodes according to the posterior distribution to yield a nodewise sparse DNN.
By analyzing several benchmark datasets, we illustrate that the proposed BNN performs well compared to other existing methods.
arXiv Detail & Related papers (2022-06-02T02:59:55Z) - Why Robust Generalization in Deep Learning is Difficult: Perspective of
Expressive Power [15.210336733607488]
We show that for binary classification problems with well-separated data, there exists a constant robust generalization gap unless the size of the neural network is exponential.
We establish an improved upper bound of $exp(mathcalO(k))$ for the network size to achieve low robust generalization error.
arXiv Detail & Related papers (2022-05-27T09:53:04Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z) - Optimization and Generalization Analysis of Transduction through
Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing.
Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem.
We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.