Related papers: A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory

A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory

URL: http://arxiv.org/abs/2601.01417v1
Date: Sun, 04 Jan 2026 07:40:42 GMT
Title: A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory
Authors: Itay Safran,
Abstract summary: We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks.<n>We show that a sufficiently narrow network cannot capture the non-linearities of the maximum.<n>This suggests that despite its simple nature, the maximum function possesses an inherent complexity.
Score: 3.0120086446979877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width $Ω\big(d^{1+\frac{1}{2^{k-2}-1}}\big)$ is necessary to represent the maximum for any depth $3\le k\le \log_2(\log_2(d))$. This is the first unconditional super-linear lower bound for this fundamental operator at depths $k\ge3$, and it holds even if the depth scales with $d$. Our proof technique is based on a combinatorial argument and associates the non-differentiable ridges of the maximum with cliques in a graph induced by the first hidden layer of the computing network, utilizing Turán's theorem from extremal graph theory to show that a sufficiently narrow network cannot capture the non-linearities of the maximum. This suggests that despite its simple nature, the maximum function possesses an inherent complexity that stems from the geometric structure of its non-differentiable hyperplanes, and provides a novel approach for proving lower bounds for deep neural networks.

Related papers

Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit [66.20349460098275]
We study the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbolx)=g(boldsymbolUboldsymbolx)$ with hidden subspace $boldsymbolUin mathbbRrtimes d$.<n>We prove that under generic non-degenerate assumptions on the link function, a standard two-layer neural network trained via layer-wise gradient descent can agnostically learn the target with $o_d(1)$ test error.
arXiv Detail & Related papers (2025-11-19T04:46:47Z)
Minimum Width of Deep Narrow Networks for Universal Approximation [9.00733527455972]
We study the lower bounds and upper bounds of the minimum width required for fully connected neural networks.<n>We present a new proof of the inequality $w_minge d_y+mathbf1_d_xd_yleq2d_x$ by constructing a more intuitive example.
arXiv Detail & Related papers (2025-11-10T08:29:14Z)
Depth-Bounds for Neural Networks via the Braid Arrangement [7.856998585396422]
We prove a non-constant lower bound of $Omega(loglog d)$ hidden layers required to represent the maximum of $d$ numbers.<n>We show that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.
arXiv Detail & Related papers (2025-02-13T13:37:52Z)
Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z)
How Many Neurons Does it Take to Approximate the Maximum? [10.995895410470279]
We study the size of a neural network needed to approximate the maximum function over $d$ inputs. We provide new lower and upper bounds on the width required for approximation across various depths.
arXiv Detail & Related papers (2023-07-18T12:47:35Z)
Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set. This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure. We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z)
Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces. Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z)
Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions [7.943024117353317]
We consider a family of deep neural networks consisting of two groups of convolutional layers, a down operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer.
arXiv Detail & Related papers (2021-07-02T08:22:12Z)
ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation [5.35599092568615]
This paper studies the power of artificial neural networks with rectified linear units. We show that two fundamental optimization problems can be solved with neural networks of size $mathcalO(m2n2)$.
arXiv Detail & Related papers (2021-02-12T17:23:34Z)
Size and Depth Separation in Approximating Natural Functions with Neural Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks. We show a complexity-theoretic barrier to proving such results beyond size $O(d)$. We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z)
A simple geometric proof for the benefit of depth in ReLU networks [57.815699322370826]
We present a simple proof for the benefit of depth in multi-layer feedforward network with rectified activation ("depth separation") We present a concrete neural network with linear depth (in $m$) and small constant width ($leq 4$) that classifies the problem with zero error.
arXiv Detail & Related papers (2021-01-18T15:40:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.