Related papers: Universal Solutions of Feedforward ReLU Networks for Interpolations

Universal Solutions of Feedforward ReLU Networks for Interpolations

URL: http://arxiv.org/abs/2208.07498v1
Date: Tue, 16 Aug 2022 02:15:03 GMT
Title: Universal Solutions of Feedforward ReLU Networks for Interpolations
Authors: Changcun Huang
Abstract summary: This paper provides a theoretical framework on the solution of feedforward ReLU networks for generalizations. To three-layer networks, we classify different kinds of solutions and model them in a normalized form; the solution finding is investigated by three dimensions, including data, networks and the training. To deep-layer networks, we present a general result called sparse-matrix principle, which could describe some basic behavior of deep layers.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper provides a theoretical framework on the solution of feedforward ReLU networks for interpolations, in terms of what is called an interpolation matrix, which is the summary, extension and generalization of our three preceding works, with the expectation that the solution of engineering could be included in this framework and finally understood. To three-layer networks, we classify different kinds of solutions and model them in a normalized form; the solution finding is investigated by three dimensions, including data, networks and the training; the mechanism of overparameterization solutions is interpreted. To deep-layer networks, we present a general result called sparse-matrix principle, which could describe some basic behavior of deep layers and explain the phenomenon of the sparse-activation mode that appears in engineering applications associated with brain science; an advantage of deep layers compared to shallower ones is manifested in this principle. As applications, a general solution of deep neural networks for classification is constructed by that principle; and we also use the principle to study the data-disentangling property of encoders. Analogous to the three-layer case, the solution of deep layers is also explored through several dimensions. The mechanism of multi-output neural networks is explained from the perspective of interpolation matrices.

Related papers

On Generalization Bounds for Neural Networks with Low Rank Layers [4.2954245208408866]
We apply Maurer's chain rule for Gaussian complexity to analyze how low-rank layers in deep networks can prevent the accumulation of rank and dimensionality factors. We compare our results to prior generalization bounds for deep networks, highlighting how deep networks with low-rank layers can achieve better generalization than those with full-rank layers.
arXiv Detail & Related papers (2024-11-20T22:20:47Z)
On the Principles of ReLU Networks with One Hidden Layer [0.0]
It remains unclear how to interpret the mechanism of its solutions obtained by the back-propagation algorithm. It is shown that, both theoretically and experimentally, the training solution for the one-dimensional input could be completely understood.
arXiv Detail & Related papers (2024-11-11T05:51:11Z)
Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z)
Conditional computation in neural networks: principles and research trends [48.14569369912931]
This article summarizes principles and ideas from the emerging area of applying textitconditional computation methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input.
arXiv Detail & Related papers (2024-03-12T11:56:38Z)
Defining Neural Network Architecture through Polytope Structures of Dataset [53.512432492636236]
This paper defines upper and lower bounds for neural network widths, which are informed by the polytope structure of the dataset in question. We develop an algorithm to investigate a converse situation where the polytope structure of a dataset can be inferred from its corresponding trained neural networks. It is established that popular datasets such as MNIST, Fashion-MNIST, and CIFAR10 can be efficiently encapsulated using no more than two polytopes with a small number of faces.
arXiv Detail & Related papers (2024-02-04T08:57:42Z)
Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression [28.851519959657466]
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks.
arXiv Detail & Related papers (2023-05-25T23:32:10Z)
Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set. This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure. We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
Approximation Power of Deep Neural Networks: an explanatory mathematical survey [0.0]
The goal of this survey is to present an explanatory review of the approximation properties of deep neural networks. We aim at understanding how and why deep neural networks outperform other classical linear and nonlinear approximation methods.
arXiv Detail & Related papers (2022-07-19T18:47:44Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Theoretical Exploration of Solutions of Feedforward ReLU networks [0.0]
This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions through basic rules. We explain three typical network architectures: the subnetwork of last three layers of convolutional networks, multi-layer feedforward networks, and the decoder of autoencoders.
arXiv Detail & Related papers (2022-01-24T01:51:52Z)
The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance. We explain how these effectively-deep networks learn nontrivial representations from training. We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.