Training Neural Networks is NP-Hard in Fixed Dimension
- URL: http://arxiv.org/abs/2303.17045v2
- Date: Thu, 18 Jan 2024 12:10:03 GMT
- Title: Training Neural Networks is NP-Hard in Fixed Dimension
- Authors: Vincent Froese, Christoph Hertrich
- Abstract summary: We study the parameterized complexity of training two-layer neural networks with respect to the dimension of the input data and the number of hidden neurons.
Our results settle the complexity status regarding these parameters almost completely.
- Score: 8.774952953654655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the parameterized complexity of training two-layer neural networks
with respect to the dimension of the input data and the number of hidden
neurons, considering ReLU and linear threshold activation functions. Albeit the
computational complexity of these problems has been studied numerous times in
recent years, several questions are still open. We answer questions by Arora et
al. [ICLR '18] and Khalife and Basu [IPCO '22] showing that both problems are
NP-hard for two dimensions, which excludes any polynomial-time algorithm for
constant dimension. We also answer a question by Froese et al. [JAIR '22]
proving W[1]-hardness for four ReLUs (or two linear threshold neurons) with
zero training error. Finally, in the ReLU case, we show fixed-parameter
tractability for the combined parameter number of dimensions and number of
ReLUs if the network is assumed to compute a convex map. Our results settle the
complexity status regarding these parameters almost completely.
Related papers
- Complexity of Deciding Injectivity and Surjectivity of ReLU Neural Networks [5.482420806459269]
We prove coNP-completeness of deciding injectivity of a ReLU layer.
We also characterize surjectivity for two-layer ReLU networks with one-dimensional output.
arXiv Detail & Related papers (2024-05-30T08:14:34Z) - Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Accelerating the training of single-layer binary neural networks using
the HHL quantum algorithm [58.720142291102135]
We show that useful information can be extracted from the quantum-mechanical implementation of Harrow-Hassidim-Lloyd (HHL)
This paper shows, however, that useful information can be extracted from the quantum-mechanical implementation of HHL, and used to reduce the complexity of finding the solution on the classical side.
arXiv Detail & Related papers (2022-10-23T11:58:05Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - On the Expected Complexity of Maxout Networks [0.0]
Recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum.
In this work we show that this phenomenon also occurs in networks with maxout (multi-argument) activation functions.
We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity, and obtain nontrivial lower bounds on the expected complexity.
arXiv Detail & Related papers (2021-07-01T11:36:32Z) - The Computational Complexity of ReLU Network Training Parameterized by
Data Dimensionality [8.940054309023525]
We analyze the influence of the dimension $d$ of the training data on the computational complexity.
We prove that known brute-force strategies are essentially optimal.
In particular, we extend a known-time algorithm for constant $d$ and convex loss functions to a more general class of loss functions.
arXiv Detail & Related papers (2021-05-18T17:05:26Z) - Achieving Small Test Error in Mildly Overparameterized Neural Networks [30.664282759625948]
We show an algorithm which finds one of these points in time.
In addition, we prove that for a fully connected neural net, with an additional assumption on the data distribution, there is a time algorithm.
arXiv Detail & Related papers (2021-04-24T06:47:20Z) - Vector-output ReLU Neural Network Problems are Copositive Programs:
Convex Analysis of Two Layer Networks and Polynomial-time Algorithms [29.975118690758126]
We describe the semi-output global dual of the two-layer vector-infinite ReLU neural network training problem.
We provide a solution which is guaranteed to be exact for certain classes of problems.
arXiv Detail & Related papers (2020-12-24T17:03:30Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.