Related papers: Depth-Width Trade-offs for Neural Networks via Topological Entropy

Depth-Width Trade-offs for Neural Networks via Topological Entropy

URL: http://arxiv.org/abs/2010.07587v1
Date: Thu, 15 Oct 2020 08:14:44 GMT
Title: Depth-Width Trade-offs for Neural Networks via Topological Entropy
Authors: Kaifeng Bu, Yaobo Zhang, Qingxian Luo
Abstract summary: We show a new connection between the expressivity of deep neural networks and topological entropy from dynamical system. We discuss the relationship between topological entropy, the number of oscillations, periods and Lipschitz constant.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the central problems in the study of deep learning theory is to understand how the structure properties, such as depth, width and the number of nodes, affect the expressivity of deep neural networks. In this work, we show a new connection between the expressivity of deep neural networks and topological entropy from dynamical system, which can be used to characterize depth-width trade-offs of neural networks. We provide an upper bound on the topological entropy of neural networks with continuous semi-algebraic units by the structure parameters. Specifically, the topological entropy of ReLU network with $l$ layers and $m$ nodes per layer is upper bounded by $O(l\log m)$. Besides, if the neural network is a good approximation of some function $f$, then the size of the neural network has an exponential lower bound with respect to the topological entropy of $f$. Moreover, we discuss the relationship between topological entropy, the number of oscillations, periods and Lipschitz constant.

Related papers

Spectral complexity of deep neural networks [2.099922236065961]
We use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder. We show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks.
arXiv Detail & Related papers (2024-05-15T17:55:05Z)
Topological Expressivity of ReLU Neural Networks [0.0]
We study the expressivity of ReLU neural networks in the setting of a binary classification problem from a topological perspective. Results show that deep ReLU neural networks are exponentially more powerful than shallow ones in terms of topological simplification.
arXiv Detail & Related papers (2023-10-17T10:28:00Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set. This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure. We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Topological Insights into Sparse Neural Networks [16.515620374178535]
We introduce an approach to understand and compare sparse neural network topologies from the perspective of graph theory. We first propose Neural Network Sparse Topology Distance (NNSTD) to measure the distance between different sparse neural networks. We show that adaptive sparse connectivity can always unveil a plenitude of sparse sub-networks with very different topologies which outperform the dense model.
arXiv Detail & Related papers (2020-06-24T22:27:21Z)
Approximation in shift-invariant spaces with deep ReLU neural networks [7.7084107194202875]
We study the expressive power of deep ReLU neural networks for approximating functions in dilated shift-invariant spaces. Approximation error bounds are estimated with respect to the width and depth of neural networks.
arXiv Detail & Related papers (2020-05-25T07:23:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.