Related papers: Theoretical Exploration of Solutions of Feedforward ReLU networks

Theoretical Exploration of Solutions of Feedforward ReLU networks

URL: http://arxiv.org/abs/2202.01919v3
Date: Wed, 9 Feb 2022 02:30:26 GMT
Title: Theoretical Exploration of Solutions of Feedforward ReLU networks
Authors: Changcun Huang
Abstract summary: This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions through basic rules. We explain three typical network architectures: the subnetwork of last three layers of convolutional networks, multi-layer feedforward networks, and the decoder of autoencoders.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions through basic rules. The constructed solutions should be universal enough to explain the network architectures of engineering. In order for that, we borrow the methodology of theoretical physics to develop the theories. Some of the consequences of our theories include: Under geometric backgrounds, the solutions of both three-layer networks and deep-layer networks are presented, and the solution universality is ensured by several ways; We give clear and intuitive interpretations of each component of network architectures, such as the parameter-sharing mechanism for multi-output, the function of each layer, the advantage of deep layers, the redundancy of parameters, and so on. We explain three typical network architectures: the subnetwork of last three layers of convolutional networks, multi-layer feedforward networks, and the decoder of autoencoders. This paper is expected to provide a basic foundation of theories of feedforward ReLU networks for further investigations.

Related papers

Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity [1.837729564584369]
We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers.<n>We use an existing methodology for hierarchical organization of tensors to examine network structure.
arXiv Detail & Related papers (2025-06-18T15:14:56Z)
On the Principles of ReLU Networks with One Hidden Layer [0.0]
It remains unclear how to interpret the mechanism of its solutions obtained by the back-propagation algorithm. It is shown that, both theoretically and experimentally, the training solution for the one-dimensional input could be completely understood.
arXiv Detail & Related papers (2024-11-11T05:51:11Z)
Unwrapping All ReLU Networks [1.370633147306388]
Deep ReLU Networks can be decomposed into a collection of linear models. We extend this decomposition to Graph Neural networks and tensor convolutional networks. We show how this model leads to computing cheap and exact SHAP values.
arXiv Detail & Related papers (2023-05-16T13:30:15Z)
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
Universal Solutions of Feedforward ReLU Networks for Interpolations [0.0]
This paper provides a theoretical framework on the solution of feedforward ReLU networks for generalizations. To three-layer networks, we classify different kinds of solutions and model them in a normalized form; the solution finding is investigated by three dimensions, including data, networks and the training. To deep-layer networks, we present a general result called sparse-matrix principle, which could describe some basic behavior of deep layers.
arXiv Detail & Related papers (2022-08-16T02:15:03Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Learning with Capsules: A Survey [73.31150426300198]
Capsule networks were proposed as an alternative approach to Convolutional Neural Networks (CNNs) for learning object-centric representations. Unlike CNNs, capsule networks are designed to explicitly model part-whole hierarchical relationships.
arXiv Detail & Related papers (2022-06-06T15:05:36Z)
Towards Understanding Theoretical Advantages of Complex-Reaction Networks [77.34726150561087]
We show that a class of functions can be approximated by a complex-reaction network using the number of parameters. For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks.
arXiv Detail & Related papers (2021-08-15T10:13:49Z)
The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance. We explain how these effectively-deep networks learn nontrivial representations from training. We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure. The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models. We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
Quasi-Equivalence of Width and Depth of Neural Networks [10.365556153676538]
We investigate if the design of artificial neural networks should have a directional preference. Inspired by the De Morgan law, we establish a quasi-equivalence between the width and depth of ReLU networks. Based on our findings, a deep network has a wide equivalent, subject to an arbitrarily small error.
arXiv Detail & Related papers (2020-02-06T21:17:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.