Approximation bounds for norm constrained neural networks with
applications to regression and GANs
- URL: http://arxiv.org/abs/2201.09418v3
- Date: Thu, 30 Mar 2023 03:26:24 GMT
- Title: Approximation bounds for norm constrained neural networks with
applications to regression and GANs
- Authors: Yuling Jiao, Yang Wang, Yunfei Yang
- Abstract summary: We prove upper and lower bounds on the approximation error of ReLU neural networks with norm constraint on the weights.
We apply these approximation bounds to analyze the convergences of regression using norm constrained neural networks and distribution estimation by GANs.
- Score: 9.645327615996914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the approximation capacity of ReLU neural networks with
norm constraint on the weights. We prove upper and lower bounds on the
approximation error of these networks for smooth function classes. The lower
bound is derived through the Rademacher complexity of neural networks, which
may be of independent interest. We apply these approximation bounds to analyze
the convergences of regression using norm constrained neural networks and
distribution estimation by GANs. In particular, we obtain convergence rates for
over-parameterized neural networks. It is also shown that GANs can achieve
optimal rate of learning probability distributions, when the discriminator is a
properly chosen norm constrained neural network.
Related papers
- Stochastic Gradient Descent for Two-layer Neural Networks [2.0349026069285423]
This paper presents a study on the convergence rates of the descent (SGD) algorithm when applied to overparameterized two-layer neural networks.
Our approach combines the Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Space (RKHS) generated by NTK.
Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the dynamics and convergence properties of neural networks.
arXiv Detail & Related papers (2024-07-10T13:58:57Z) - Optimal rates of approximation by shallow ReLU$^k$ neural networks and
applications to nonparametric regression [12.21422686958087]
We study the approximation capacity of some variation spaces corresponding to shallow ReLU$k$ neural networks.
For functions with less smoothness, the approximation rates in terms of the variation norm are established.
We show that shallow neural networks can achieve the minimax optimal rates for learning H"older functions.
arXiv Detail & Related papers (2023-04-04T06:35:02Z) - Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective.
We show how to compute this efficiently for tractable circuits.
We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Learning Distributions by Generative Adversarial Networks: Approximation
and Generalization [0.6768558752130311]
We study how well generative adversarial networks learn from finite samples by analyzing the convergence rates of these models.
Our analysis is based on a new inequality oracle that decomposes the estimation error of GAN into the discriminator and generator approximation errors.
For generator approximation error, we show that neural network can approximately transform a low-dimensional source distribution to a high-dimensional target distribution.
arXiv Detail & Related papers (2022-05-25T09:26:17Z) - Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with
Linear Convergence Rates [7.094295642076582]
Mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime.
We establish a new linear convergence result for two-layer neural networks trained by continuous-time noisy descent in the mean-field regime.
arXiv Detail & Related papers (2022-05-19T21:05:40Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - On the approximation of functions by tanh neural networks [0.0]
We derive bounds on the error, in high-order Sobolev norms, incurred in the approximation of Sobolev-regular.
We show that tanh neural networks with only two hidden layers suffice to approximate functions at comparable or better rates than much deeper ReLU neural networks.
arXiv Detail & Related papers (2021-04-18T19:30:45Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.