Related papers: Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks

Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks

URL: http://arxiv.org/abs/2202.05258v1
Date: Thu, 10 Feb 2022 18:59:14 GMT
Title: Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks
Authors: Sitan Chen, Aravind Gollakota, Adam R. Klivans, Raghu Meka
Abstract summary: We give exponential statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. Previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ. We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction.
Score: 21.687992445577226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We give exponential statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ. Prior work hinted at the impossibility of our result: Vempala and Wilmes showed that general SQ lower bounds cannot apply to any real-valued family of functions that satisfies a simple non-degeneracy condition. To circumvent their result, we refine a lifting procedure due to Daniely and Vardi that reduces Boolean PAC learning problems to Gaussian ones. We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction. As such, we also prove new cryptographic hardness results for PAC learning two-hidden-layer ReLU networks, as well as new lower bounds for learning constant-depth ReLU networks from membership queries.

Related papers

LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z)
A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks [4.21061712600981]
We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios. This algorithm combines the standard descent gradient and the gradient clipping method. We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.
arXiv Detail & Related papers (2023-05-20T07:18:06Z)
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z)
Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z)
Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping [11.24426822697648]
We show that trained neural networks are smooth with respect to their inputs when trained by Gradient Descent (GD) In the noise-free case the proof does not rely on any kernelization and can be regarded as a finite-width result.
arXiv Detail & Related papers (2021-07-12T11:56:53Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Tight Hardness Results for Training Depth-2 ReLU Networks [38.60407125604287]
We prove several hardness results for training depth-2 neural networks with the ReLU activation function. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set.
arXiv Detail & Related papers (2020-11-27T04:18:00Z)
The Polynomial Method is Universal for Distribution-Free Correlational SQ Learning [9.036124518750732]
Generalizing Malach and Shalev-Shwartz (2022) that gave tight correlational SQ (CSQ) lower bounds for learning formulas. Lower bounds on the threshold or approximate degree of any function class directly imply CSQ lower bounds for PAC or DNF learning.
arXiv Detail & Related papers (2020-10-22T17:55:26Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent [25.589302381660453]
We show that any trained using gradient descent with respect to square-loss distribution will fail to achieve small test error in time. For classification, we give a stronger result, namely that any statistical query (SQ) will fail to achieve small test error in time.
arXiv Detail & Related papers (2020-06-22T05:15:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.