Optimal Deep Neural Network Approximation for Korobov Functions with
respect to Sobolev Norms
- URL: http://arxiv.org/abs/2311.04779v1
- Date: Wed, 8 Nov 2023 15:59:10 GMT
- Title: Optimal Deep Neural Network Approximation for Korobov Functions with
respect to Sobolev Norms
- Authors: Yahong Yang and Yulong Lu
- Abstract summary: This paper establishes the nearly optimal rate of approximation for deep neural networks (DNNs) when applied to Korobov functions.
Our achieved approximation rate demonstrates a remarkable "super-convergence" rate, outperforming traditional methods and any continuous function approximator.
- Score: 8.249180979158819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper establishes the nearly optimal rate of approximation for deep
neural networks (DNNs) when applied to Korobov functions, effectively
overcoming the curse of dimensionality. The approximation results presented in
this paper are measured with respect to $L_p$ norms and $H^1$ norms. Our
achieved approximation rate demonstrates a remarkable "super-convergence" rate,
outperforming traditional methods and any continuous function approximator.
These results are non-asymptotic, providing error bounds that consider both the
width and depth of the networks simultaneously.
Related papers
- Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes [1.0878040851638]
We show that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive suggested by the central limit theorem.
We also apply our bounds to obtain theoretical approximations for the exact posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set.
arXiv Detail & Related papers (2023-12-18T22:29:40Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Approximation Results for Gradient Descent trained Neural Networks [0.0]
The networks are fully connected constant depth increasing width.
The continuous kernel error norm implies an approximation under the natural smoothness assumption required for smooth functions.
arXiv Detail & Related papers (2023-09-09T18:47:55Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram
Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory.
We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability.
It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Optimal Approximation Complexity of High-Dimensional Functions with
Neural Networks [3.222802562733787]
We investigate properties of neural networks that use both ReLU and $x2$ as activation functions.
We show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.
arXiv Detail & Related papers (2023-01-30T17:29:19Z) - Simultaneous approximation of a smooth function and its derivatives by
deep neural networks with piecewise-polynomial activations [2.15145758970292]
We derive the required depth, width, and sparsity of a deep neural network to approximate any H"older smooth function up to a given approximation error in H"older norms.
The latter feature is essential to control generalization errors in many statistical and machine learning applications.
arXiv Detail & Related papers (2022-06-20T01:18:29Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks [0.0]
We prove sharp lower bounds on the approximation rates for shallow neural networks.
These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU.
arXiv Detail & Related papers (2021-06-28T22:01:42Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.