Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks
- URL: http://arxiv.org/abs/2202.05258v1
- Date: Thu, 10 Feb 2022 18:59:14 GMT
- Title: Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks
- Authors: Sitan Chen, Aravind Gollakota, Adam R. Klivans, Raghu Meka
- Abstract summary: We give exponential statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model.
Previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ.
We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction.
- Score: 21.687992445577226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give exponential statistical query (SQ) lower bounds for learning
two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard
(noise-free) model. No general SQ lower bounds were known for learning ReLU
networks of any depth in this setting: previous SQ lower bounds held only for
adversarial noise models (agnostic learning) or restricted models such as
correlational SQ.
Prior work hinted at the impossibility of our result: Vempala and Wilmes
showed that general SQ lower bounds cannot apply to any real-valued family of
functions that satisfies a simple non-degeneracy condition.
To circumvent their result, we refine a lifting procedure due to Daniely and
Vardi that reduces Boolean PAC learning problems to Gaussian ones. We show how
to extend their technique to other learning models and, in many well-studied
cases, obtain a more efficient reduction. As such, we also prove new
cryptographic hardness results for PAC learning two-hidden-layer ReLU networks,
as well as new lower bounds for learning constant-depth ReLU networks from
membership queries.
Related papers
- LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - A Framework for Provably Stable and Consistent Training of Deep
Feedforward Networks [4.21061712600981]
We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios.
This algorithm combines the standard descent gradient and the gradient clipping method.
We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.
arXiv Detail & Related papers (2023-05-20T07:18:06Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Nonparametric Regression with Shallow Overparameterized Neural Networks
Trained by GD with Early Stopping [11.24426822697648]
We show that trained neural networks are smooth with respect to their inputs when trained by Gradient Descent (GD)
In the noise-free case the proof does not rely on any kernelization and can be regarded as a finite-width result.
arXiv Detail & Related papers (2021-07-12T11:56:53Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Tight Hardness Results for Training Depth-2 ReLU Networks [38.60407125604287]
We prove several hardness results for training depth-2 neural networks with the ReLU activation function.
Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set.
arXiv Detail & Related papers (2020-11-27T04:18:00Z) - The Polynomial Method is Universal for Distribution-Free Correlational
SQ Learning [9.036124518750732]
Generalizing Malach and Shalev-Shwartz (2022) that gave tight correlational SQ (CSQ) lower bounds for learning formulas.
Lower bounds on the threshold or approximate degree of any function class directly imply CSQ lower bounds for PAC or DNF learning.
arXiv Detail & Related papers (2020-10-22T17:55:26Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Superpolynomial Lower Bounds for Learning One-Layer Neural Networks
using Gradient Descent [25.589302381660453]
We show that any trained using gradient descent with respect to square-loss distribution will fail to achieve small test error in time.
For classification, we give a stronger result, namely that any statistical query (SQ) will fail to achieve small test error in time.
arXiv Detail & Related papers (2020-06-22T05:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.