ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation
- URL: http://arxiv.org/abs/2102.06635v5
- Date: Wed, 17 Jul 2024 15:31:15 GMT
- Title: ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation
- Authors: Christoph Hertrich, Leon Sering,
- Abstract summary: This paper studies the power of artificial neural networks with rectified linear units.
We show that two fundamental optimization problems can be solved with neural networks of size $mathcalO(m2n2)$.
- Score: 5.35599092568615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the expressive power of artificial neural networks with rectified linear units. In order to study them as a model of real-valued computation, we introduce the concept of Max-Affine Arithmetic Programs and show equivalence between them and neural networks concerning natural complexity measures. We then use this result to show that two fundamental combinatorial optimization problems can be solved with polynomial-size neural networks. First, we show that for any undirected graph with $n$ nodes, there is a neural network (with fixed weights and biases) of size $\mathcal{O}(n^3)$ that takes the edge weights as input and computes the value of a minimum spanning tree of the graph. Second, we show that for any directed graph with $n$ nodes and $m$ arcs, there is a neural network of size $\mathcal{O}(m^2n^2)$ that takes the arc capacities as input and computes a maximum flow. Our results imply that these two problems can be solved with strongly polynomial time algorithms that solely use affine transformations and maxima computations, but no comparison-based branchings.
Related papers
- Neural Networks and (Virtual) Extended Formulations [5.762677915745415]
We make a step towards proving lower bounds on the size of neural networks by linking their representative capabilities to the notion of the extension complexity $mathrmxc(P)$.
We show that powerful results on the ordinary extension complexity can be converted into lower bounds for monotone neural networks.
arXiv Detail & Related papers (2024-11-05T11:12:11Z) - On the Complexity of Neural Computation in Superposition [3.9803704378699103]
Recent advances in the understanding of neural networks suggest that superposition is a key mechanism underlying the computational efficiency of large-scale networks.
We show a nearly tight upper bound: logical operations like pairwise AND can be computed using $O(sqrtm' log m')$ neurons and $O(m' log2 m')$ parameters.
Our results open a path for using complexity theoretic techniques in neural network interpretability research.
arXiv Detail & Related papers (2024-09-05T18:58:59Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Moccasin: Efficient Tensor Rematerialization for Neural Networks [21.348126106786914]
We develop a new constraint programming formulation called textscMoccasin with only $O(n)$ integer variables.
We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.
arXiv Detail & Related papers (2023-04-27T18:41:37Z) - Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete [4.170994234169557]
We consider the problem of finding weights and biases for a two-layer fully connected neural network to fit a given set of data points as well as possible, also known as EmpiricalRiskmization.
We prove that algebra numbers of arbitrarily large degree are required as weights to be able to train some instances to optimality, even if all data points are rational.
A consequence of this is that a search algorithm like the one by Basu, Mianjy and Mukherjee [ICLR 2018] is impossible for networks with more than one output dimension, unless $mathsfNP=exists
arXiv Detail & Related papers (2022-04-04T10:28:11Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - A Corrective View of Neural Networks: Representation, Memorization and
Learning [26.87238691716307]
We develop a corrective mechanism for neural network approximation.
We show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels.
We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions.
arXiv Detail & Related papers (2020-02-01T20:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.