On the growth of the parameters of approximating ReLU neural networks
- URL: http://arxiv.org/abs/2406.14936v1
- Date: Fri, 21 Jun 2024 07:45:28 GMT
- Title: On the growth of the parameters of approximating ReLU neural networks
- Authors: Erion Morina, Martin Holler,
- Abstract summary: This work focuses on the analysis of fully connected feed forward ReLU neural networks as they approximate a given, smooth function.
In contrast to conventionally studied universal approximation properties under increasing architectures, we are concerned with the growth of the parameters of approximating networks.
- Score: 0.542249320079018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work focuses on the analysis of fully connected feed forward ReLU neural networks as they approximate a given, smooth function. In contrast to conventionally studied universal approximation properties under increasing architectures, e.g., in terms of width or depth of the networks, we are concerned with the asymptotic growth of the parameters of approximating networks. Such results are of interest, e.g., for error analysis or consistency results for neural network training. The main result of our work is that, for a ReLU architecture with state of the art approximation error, the realizing parameters grow at most polynomially. The obtained rate with respect to a normalized network size is compared to existing results and is shown to be superior in most cases, in particular for high dimensional input.
Related papers
- Exploring the Complexity of Deep Neural Networks through Functional Equivalence [1.3597551064547502]
We present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced.
We demonstrate that functional equivalence benefits optimization, as over parameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space.
arXiv Detail & Related papers (2023-05-19T04:01:27Z) - Field theory for optimal signal propagation in ResNets [1.053373860696675]
Residual networks have significantly better trainability and performance than feed-forward networks at large depth.
Previous works found that adding a scaling parameter for the residual branch further improves generalization performance.
We derive a systematic finite-size field theory for residual networks to study signal propagation and its dependence on the scaling for the residual branch.
arXiv Detail & Related papers (2023-05-12T18:14:21Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Generalization Error Bounds for Iterative Recovery Algorithms Unfolded
as Neural Networks [6.173968909465726]
We introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements.
By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types.
arXiv Detail & Related papers (2021-12-08T16:17:33Z) - Approximation Properties of Deep ReLU CNNs [8.74591882131599]
This paper is devoted to establishing $L2$ approximation properties for deep ReLU convolutional neural networks (CNNs) on two-dimensional space.
The analysis is based on a decomposition theorem for convolutional kernels with large spatial size and multi-channel.
arXiv Detail & Related papers (2021-09-01T05:16:11Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Expressivity of Deep Neural Networks [2.7909470193274593]
In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks.
While the mainbody of existing results is for general feedforward architectures, we also depict approximation results for convolutional, residual and recurrent neural networks.
arXiv Detail & Related papers (2020-07-09T13:08:01Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.