Tight Hardness Results for Training Depth-2 ReLU Networks
- URL: http://arxiv.org/abs/2011.13550v1
- Date: Fri, 27 Nov 2020 04:18:00 GMT
- Title: Tight Hardness Results for Training Depth-2 ReLU Networks
- Authors: Surbhi Goel, Adam Klivans, Pasin Manurangsi, Daniel Reichman
- Abstract summary: We prove several hardness results for training depth-2 neural networks with the ReLU activation function.
Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set.
- Score: 38.60407125604287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove several hardness results for training depth-2 neural networks with
the ReLU activation function; these networks are simply weighted sums (that may
include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural
network that minimizes the square loss with respect to a given training set. We
prove that this problem is NP-hard already for a network with a single ReLU. We
also prove NP-hardness for outputting a weighted sum of $k$ ReLUs minimizing
the squared error (for $k>1$) even in the realizable setting (i.e., when the
labels are consistent with an unknown depth-2 ReLU network). We are also able
to obtain lower bounds on the running time in terms of the desired additive
error $\epsilon$. To obtain our lower bounds, we use the Gap Exponential Time
Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of
approximating the well known Densest $\kappa$-Subgraph problem in
subexponential time (these hypotheses are used separately in proving different
lower bounds). For example, we prove that under reasonable hardness
assumptions, any proper learning algorithm for finding the best fitting ReLU
must run in time exponential in $1/\epsilon^2$. Together with a previous work
regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the
first separation between proper and improper algorithms for learning a ReLU. We
also study the problem of properly learning a depth-2 network of ReLUs with
bounded weights giving new (worst-case) upper bounds on the running time needed
to learn such networks both in the realizable and agnostic settings. Our upper
bounds on the running time essentially matches our lower bounds in terms of the
dependency on $\epsilon$.
Related papers
- Contextual Bandit Optimization with Pre-Trained Neural Networks [0.0]
We investigate how pre-training can help us in the regime of smaller models.
We show sublinear regret of E2TC when the dimension of the last layer and number of actions $K$ are much smaller than the horizon $T$.
In the weak training regime, when only the last layer is learned, the problem reduces to a misspecified linear bandit.
arXiv Detail & Related papers (2025-01-09T10:21:19Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - A Framework for Provably Stable and Consistent Training of Deep
Feedforward Networks [4.21061712600981]
We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios.
This algorithm combines the standard descent gradient and the gradient clipping method.
We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.
arXiv Detail & Related papers (2023-05-20T07:18:06Z) - Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks [21.687992445577226]
We give exponential statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model.
Previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ.
We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction.
arXiv Detail & Related papers (2022-02-10T18:59:14Z) - Implicit Regularization Towards Rank Minimization in ReLU Networks [34.41953136999683]
We study the conjectured relationship between the implicit regularization in neural networks and rank minimization.
We focus on nonlinear ReLU networks, providing several new positive and negative results.
arXiv Detail & Related papers (2022-01-30T09:15:44Z) - Efficient Algorithms for Learning Depth-2 Neural Networks with General
ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations.
In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU
Networks [48.32532049640782]
We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals.
For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
arXiv Detail & Related papers (2020-06-22T17:53:54Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.