Related papers: Tight Hardness Results for Training Depth-2 ReLU Networks

Tight Hardness Results for Training Depth-2 ReLU Networks

URL: http://arxiv.org/abs/2011.13550v1
Date: Fri, 27 Nov 2020 04:18:00 GMT
Title: Tight Hardness Results for Training Depth-2 ReLU Networks
Authors: Surbhi Goel, Adam Klivans, Pasin Manurangsi, Daniel Reichman
Abstract summary: We prove several hardness results for training depth-2 neural networks with the ReLU activation function. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set.
Score: 38.60407125604287
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of $k$ ReLUs minimizing the squared error (for $k>1$) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error $\epsilon$. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest $\kappa$-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in $1/\epsilon^2$. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on $\epsilon$.

Related papers

Contextual Bandit Optimization with Pre-Trained Neural Networks [0.0]
We investigate how pre-training can help us in the regime of smaller models. We show sublinear regret of E2TC when the dimension of the last layer and number of actions $K$ are much smaller than the horizon $T$. In the weak training regime, when only the last layer is learned, the problem reduces to a misspecified linear bandit.
arXiv Detail & Related papers (2025-01-09T10:21:19Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z)
A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks [4.21061712600981]
We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios. This algorithm combines the standard descent gradient and the gradient clipping method. We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.
arXiv Detail & Related papers (2023-05-20T07:18:06Z)
Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks [21.687992445577226]
We give exponential statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. Previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ. We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction.
arXiv Detail & Related papers (2022-02-10T18:59:14Z)
Implicit Regularization Towards Rank Minimization in ReLU Networks [34.41953136999683]
We study the conjectured relationship between the implicit regularization in neural networks and rank minimization. We focus on nonlinear ReLU networks, providing several new positive and negative results.
arXiv Detail & Related papers (2022-01-30T09:15:44Z)
Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations. In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks [48.32532049640782]
We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals. For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
arXiv Detail & Related papers (2020-06-22T17:53:54Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.