Theoretical Exploration of Solutions of Feedforward ReLU networks
- URL: http://arxiv.org/abs/2202.01919v3
- Date: Wed, 9 Feb 2022 02:30:26 GMT
- Title: Theoretical Exploration of Solutions of Feedforward ReLU networks
- Authors: Changcun Huang
- Abstract summary: This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions through basic rules.
We explain three typical network architectures: the subnetwork of last three layers of convolutional networks, multi-layer feedforward networks, and the decoder of autoencoders.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to interpret the mechanism of feedforward ReLU networks by
exploring their solutions for piecewise linear functions through basic rules.
The constructed solutions should be universal enough to explain the network
architectures of engineering. In order for that, we borrow the methodology of
theoretical physics to develop the theories. Some of the consequences of our
theories include: Under geometric backgrounds, the solutions of both
three-layer networks and deep-layer networks are presented, and the solution
universality is ensured by several ways; We give clear and intuitive
interpretations of each component of network architectures, such as the
parameter-sharing mechanism for multi-output, the function of each layer, the
advantage of deep layers, the redundancy of parameters, and so on. We explain
three typical network architectures: the subnetwork of last three layers of
convolutional networks, multi-layer feedforward networks, and the decoder of
autoencoders. This paper is expected to provide a basic foundation of theories
of feedforward ReLU networks for further investigations.
Related papers
- On the Principles of ReLU Networks with One Hidden Layer [0.0]
It remains unclear how to interpret the mechanism of its solutions obtained by the back-propagation algorithm.
It is shown that, both theoretically and experimentally, the training solution for the one-dimensional input could be completely understood.
arXiv Detail & Related papers (2024-11-11T05:51:11Z) - Unwrapping All ReLU Networks [1.370633147306388]
Deep ReLU Networks can be decomposed into a collection of linear models.
We extend this decomposition to Graph Neural networks and tensor convolutional networks.
We show how this model leads to computing cheap and exact SHAP values.
arXiv Detail & Related papers (2023-05-16T13:30:15Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Universal Solutions of Feedforward ReLU Networks for Interpolations [0.0]
This paper provides a theoretical framework on the solution of feedforward ReLU networks for generalizations.
To three-layer networks, we classify different kinds of solutions and model them in a normalized form; the solution finding is investigated by three dimensions, including data, networks and the training.
To deep-layer networks, we present a general result called sparse-matrix principle, which could describe some basic behavior of deep layers.
arXiv Detail & Related papers (2022-08-16T02:15:03Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Learning with Capsules: A Survey [73.31150426300198]
Capsule networks were proposed as an alternative approach to Convolutional Neural Networks (CNNs) for learning object-centric representations.
Unlike CNNs, capsule networks are designed to explicitly model part-whole hierarchical relationships.
arXiv Detail & Related papers (2022-06-06T15:05:36Z) - Towards Understanding Theoretical Advantages of Complex-Reaction
Networks [77.34726150561087]
We show that a class of functions can be approximated by a complex-reaction network using the number of parameters.
For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks.
arXiv Detail & Related papers (2021-08-15T10:13:49Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Quasi-Equivalence of Width and Depth of Neural Networks [10.365556153676538]
We investigate if the design of artificial neural networks should have a directional preference.
Inspired by the De Morgan law, we establish a quasi-equivalence between the width and depth of ReLU networks.
Based on our findings, a deep network has a wide equivalent, subject to an arbitrarily small error.
arXiv Detail & Related papers (2020-02-06T21:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.