Depth-Bounds for Neural Networks via the Braid Arrangement
- URL: http://arxiv.org/abs/2502.09324v1
- Date: Thu, 13 Feb 2025 13:37:52 GMT
- Title: Depth-Bounds for Neural Networks via the Braid Arrangement
- Authors: Moritz Grillo, Christoph Hertrich, Georg Loho,
- Abstract summary: We prove a non-constant lower bound of $Omega(loglog d)$ hidden layers required to represent the maximum of $d$ numbers.
We show that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.
- Score: 5.127394801557798
- License:
- Abstract: We contribute towards resolving the open question of how many hidden layers are required in ReLU networks for exactly representing all continuous and piecewise linear functions on $\mathbb{R}^d$. While the question has been resolved in special cases, the best known lower bound in general is still 2. We focus on neural networks that are compatible with certain polyhedral complexes, more precisely with the braid fan. For such neural networks, we prove a non-constant lower bound of $\Omega(\log\log d)$ hidden layers required to exactly represent the maximum of $d$ numbers. Additionally, under our assumption, we provide a combinatorial proof that 3 hidden layers are necessary to compute the maximum of 5 numbers; this had only been verified with an excessive computation so far. Finally, we show that a natural generalization of the best known upper bound to maxout networks is not tight, by demonstrating that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - How Many Neurons Does it Take to Approximate the Maximum? [10.995895410470279]
We study the size of a neural network needed to approximate the maximum function over $d$ inputs.
We provide new lower and upper bounds on the width required for approximation across various depths.
arXiv Detail & Related papers (2023-07-18T12:47:35Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - Lower Bounds on the Depth of Integral ReLU Neural Networks via Lattice
Polytopes [3.0079490585515343]
We show that $lceillog_(n)rceil$ hidden layers are indeed necessary to compute the maximum of $n$ numbers.
Our results are based on the known duality between neural networks and Newton polytopes via tropical geometry.
arXiv Detail & Related papers (2023-02-24T10:14:53Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z) - Global Convergence of Deep Networks with One Wide Layer Followed by
Pyramidal Topology [28.49901662584467]
We prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee.
All the remaining layers are allowed to have constant widths, and form a pyramidal topology.
arXiv Detail & Related papers (2020-02-18T20:21:27Z) - Global Convergence of Frank Wolfe on One Hidden Layer Networks [121.96696298666014]
We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks.
When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program.
arXiv Detail & Related papers (2020-02-06T11:58:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.