Related papers: Achieving Small Test Error in Mildly Overparameterized Neural Networks

Achieving Small Test Error in Mildly Overparameterized Neural Networks

URL: http://arxiv.org/abs/2104.11895v1
Date: Sat, 24 Apr 2021 06:47:20 GMT
Title: Achieving Small Test Error in Mildly Overparameterized Neural Networks
Authors: Shiyu Liang, Ruoyu Sun and R. Srikant
Abstract summary: We show an algorithm which finds one of these points in time. In addition, we prove that for a fully connected neural net, with an additional assumption on the data distribution, there is a time algorithm.
Score: 30.664282759625948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization. Many existing works that study optimization and generalization together are based on neural tangent kernel and require a very large width. In this work, we are interested in the following question: for a binary classification problem with two-layer mildly over-parameterized ReLU network, can we find a point with small test error in polynomial time? We first show that the landscape of loss functions with explicit regularization has the following property: all local minima and certain other points which are only stationary in certain directions achieve small test error. We then prove that for convolutional neural nets, there is an algorithm which finds one of these points in polynomial time (in the input dimension and the number of data points). In addition, we prove that for a fully connected neural net, with an additional assumption on the data distribution, there is a polynomial time algorithm.

Related papers

Verified Neural Compressed Sensing [58.98637799432153]
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. We show that the complexity of the network can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
arXiv Detail & Related papers (2024-05-07T12:20:12Z)
Understanding the Difficulty of Solving Cauchy Problems with PINNs [31.98081858215356]
PINNs often fail to achieve the same level of accuracy as classical methods in solving differential equations. We show that minimizing the sum of $L2$ residual and initial condition error is not sufficient to guarantee the true solution. We demonstrate that when the global minimum does not exist, machine precision becomes the predominant source of error.
arXiv Detail & Related papers (2024-05-04T04:22:52Z)
Neural networks with linear threshold activations: structure and algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. We also give precise bounds on the sizes of the neural networks required to represent any function in the class. We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks [19.216784367141972]
We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks. We quantify the performance of these neural network estimators when the data-generating function belongs to the space of functions of second-order bounded variation in the Radon domain.
arXiv Detail & Related papers (2021-09-18T05:56:06Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms [29.975118690758126]
We describe the semi-output global dual of the two-layer vector-infinite ReLU neural network training problem. We provide a solution which is guaranteed to be exact for certain classes of problems.
arXiv Detail & Related papers (2020-12-24T17:03:30Z)
Random Vector Functional Link Networks for Function Approximation on Manifolds [8.535815777849786]
We show that single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space.
arXiv Detail & Related papers (2020-07-30T23:50:44Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.