Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes
- URL: http://arxiv.org/abs/2311.10972v1
- Date: Sat, 18 Nov 2023 04:41:07 GMT
- Title: Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes
- Authors: Yifei Wang and Mert Pilanci
- Abstract summary: We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
- Score: 70.52097560486683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the complexity of training a two-layer ReLU neural network
with weight decay regularization. Previous research has shown that the optimal
solution of this problem can be found by solving a standard cone-constrained
convex program. Using this convex formulation, we prove that the hardness of
approximation of ReLU networks not only mirrors the complexity of the Max-Cut
problem but also, in certain special cases, exactly corresponds to it. In
particular, when $\epsilon\leq\sqrt{84/83}-1\approx 0.006$, we show that it is
NP-hard to find an approximate global optimizer of the ReLU network objective
with relative error $\epsilon$ with respect to the objective value. Moreover,
we develop a randomized algorithm which mirrors the Goemans-Williamson rounding
of semidefinite Max-Cut relaxations. To provide polynomial-time approximations,
we classify training datasets into three categories: (i) For orthogonal
separable datasets, a precise solution can be obtained in polynomial-time. (ii)
When there is a negative correlation between samples of different classes, we
give a polynomial-time approximation with relative error $\sqrt{\pi/2}-1\approx
0.253$. (iii) For general datasets, the degree to which the problem can be
approximated in polynomial-time is governed by a geometric factor that controls
the diameter of two zonotopes intrinsic to the dataset. To our knowledge, these
results present the first polynomial-time approximation guarantees along with
first hardness of approximation results for regularized ReLU networks.
Related papers
- Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time [45.72323731094864]
In this paper, we study the optimality gap between two-layer ReLULU networks regularized with weight decay and their convex relaxations.
Our study sheds new light on understanding why local methods work well.
arXiv Detail & Related papers (2024-02-06T01:29:35Z) - A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical
Initialization [107.07093621337084]
Normalized-Cut (N-Cut) is a famous model of spectral clustering.
Traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation.
We propose a novel N-Cut solver based on the famous coordinate descent method.
arXiv Detail & Related papers (2023-11-26T07:11:58Z) - From Monte Carlo to neural networks approximations of boundary value
problems [0.0]
We show that the solution to the Poisson equation can be numerically approximated in the sup-normy by Monte Carlo methods.
We also show that the obtained Monte Carlo renders in a constructive way.
arXiv Detail & Related papers (2022-09-03T14:17:58Z) - Optimal Randomized Approximations for Matrix based Renyi's Entropy [16.651155375441796]
We develop random approximations for integer order $alpha$ cases and series approximations for non-integer $alpha$ cases.
Large-scale simulations and real-world applications validate the effectiveness of the developed approximations.
arXiv Detail & Related papers (2022-05-16T02:24:52Z) - Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions.
We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z) - Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs [39.799125462526234]
We develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization.
We numerically validate our theoretical results via experiments involving both synthetic and real datasets.
arXiv Detail & Related papers (2021-10-11T18:00:30Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Implicit Convex Regularizers of CNN Architectures: Convex Optimization
of Two- and Three-Layer Networks in Polynomial Time [70.15611146583068]
We study training of Convolutional Neural Networks (CNNs) with ReLU activations.
We introduce exact convex optimization with a complexity with respect to the number of data samples, the number of neurons, and data dimension.
arXiv Detail & Related papers (2020-06-26T04:47:20Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z) - SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
SURF is an algorithm for approximating distributions by piecewises.
It outperforms state-of-the-art algorithms in experiments.
arXiv Detail & Related papers (2020-02-22T01:03:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.