Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes
- URL: http://arxiv.org/abs/2311.10972v1
- Date: Sat, 18 Nov 2023 04:41:07 GMT
- Title: Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes
- Authors: Yifei Wang and Mert Pilanci
- Abstract summary: We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
- Score: 70.52097560486683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the complexity of training a two-layer ReLU neural network
with weight decay regularization. Previous research has shown that the optimal
solution of this problem can be found by solving a standard cone-constrained
convex program. Using this convex formulation, we prove that the hardness of
approximation of ReLU networks not only mirrors the complexity of the Max-Cut
problem but also, in certain special cases, exactly corresponds to it. In
particular, when $\epsilon\leq\sqrt{84/83}-1\approx 0.006$, we show that it is
NP-hard to find an approximate global optimizer of the ReLU network objective
with relative error $\epsilon$ with respect to the objective value. Moreover,
we develop a randomized algorithm which mirrors the Goemans-Williamson rounding
of semidefinite Max-Cut relaxations. To provide polynomial-time approximations,
we classify training datasets into three categories: (i) For orthogonal
separable datasets, a precise solution can be obtained in polynomial-time. (ii)
When there is a negative correlation between samples of different classes, we
give a polynomial-time approximation with relative error $\sqrt{\pi/2}-1\approx
0.253$. (iii) For general datasets, the degree to which the problem can be
approximated in polynomial-time is governed by a geometric factor that controls
the diameter of two zonotopes intrinsic to the dataset. To our knowledge, these
results present the first polynomial-time approximation guarantees along with
first hardness of approximation results for regularized ReLU networks.
Related papers
- A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics [21.55547541297847]
We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm.
Under mild distributional assumptions, we characterize the effective dimension $d_mathrmeff$ that controls both sample and computational complexity.
arXiv Detail & Related papers (2024-08-14T02:13:35Z) - Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time [45.72323731094864]
In this paper, we study the optimality gap between two-layer ReLULU networks regularized with weight decay and their convex relaxations.
Our study sheds new light on understanding why local methods work well.
arXiv Detail & Related papers (2024-02-06T01:29:35Z) - From Monte Carlo to neural networks approximations of boundary value problems [0.0]
We show that the solution to Poisson equation can be numerically approximated in the sup-norm by Monte Carlo methods.
We also show that the obtained Monte Carlo solver renders in a constructive way ReLU deep neural network (DNN) solutions to Poisson problem.
arXiv Detail & Related papers (2022-09-03T14:17:58Z) - Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions.
We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z) - Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs [39.799125462526234]
We develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization.
We numerically validate our theoretical results via experiments involving both synthetic and real datasets.
arXiv Detail & Related papers (2021-10-11T18:00:30Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Implicit Convex Regularizers of CNN Architectures: Convex Optimization
of Two- and Three-Layer Networks in Polynomial Time [70.15611146583068]
We study training of Convolutional Neural Networks (CNNs) with ReLU activations.
We introduce exact convex optimization with a complexity with respect to the number of data samples, the number of neurons, and data dimension.
arXiv Detail & Related papers (2020-06-26T04:47:20Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z) - SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
SURF is an algorithm for approximating distributions by piecewises.
It outperforms state-of-the-art algorithms in experiments.
arXiv Detail & Related papers (2020-02-22T01:03:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.