Excess risk bound for deep learning under weak dependence
- URL: http://arxiv.org/abs/2302.07503v1
- Date: Wed, 15 Feb 2023 07:23:48 GMT
- Title: Excess risk bound for deep learning under weak dependence
- Authors: William Kengne
- Abstract summary: This paper considers deep neural networks for learning weakly dependent processes.
We derive the required depth, width and sparsity of a deep neural network to approximate any H"older smooth function.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers deep neural networks for learning weakly dependent
processes in a general framework that includes, for instance, regression
estimation, time series prediction, time series classification. The $\psi$-weak
dependence structure considered is quite large and covers other conditions such
as mixing, association,$\ldots$ Firstly, the approximation of smooth functions
by deep neural networks with a broad class of activation functions is
considered. We derive the required depth, width and sparsity of a deep neural
network to approximate any H\"{o}lder smooth function, defined on any compact
set $\mx$. Secondly, we establish a bound of the excess risk for the learning
of weakly dependent observations by deep neural networks. When the target
function is sufficiently smooth, this bound is close to the usual
$\mathcal{O}(n^{-1/2})$.
Related papers
- Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis [16.893624100273108]
We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices trained with gradient descent in the supervised learning setting.
We prove that gradient descent can achieve optimality emphwithout massive over parameterization.
Our results are based on an explicit characterization of the class of dynamical systems that can be approximated and learned by recurrent neural networks.
arXiv Detail & Related papers (2024-02-19T15:56:43Z) - Penalized deep neural networks estimator with general loss functions
under weak dependence [0.0]
This paper carries out sparse-penalized deep neural networks predictors for learning weakly dependent processes.
Some simulation results are provided, and application to the forecast of the particulate matter in the Vit'oria metropolitan area is also considered.
arXiv Detail & Related papers (2023-05-10T15:06:53Z) - Deep learning for $\psi$-weakly dependent processes [0.0]
We perform deep neural networks for learning $psi$-weakly dependent processes.
The consistency of the empirical risk minimization algorithm in the class of deep neural networks predictors is established.
Some simulation results are provided, as well as an application to the US recession data.
arXiv Detail & Related papers (2023-02-01T09:31:15Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Simultaneous approximation of a smooth function and its derivatives by
deep neural networks with piecewise-polynomial activations [2.15145758970292]
We derive the required depth, width, and sparsity of a deep neural network to approximate any H"older smooth function up to a given approximation error in H"older norms.
The latter feature is essential to control generalization errors in many statistical and machine learning applications.
arXiv Detail & Related papers (2022-06-20T01:18:29Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Deep Networks and the Multiple Manifold Problem [15.144495799445824]
We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere.
We prove for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width grows as a sufficiently large in $L$.
Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem.
arXiv Detail & Related papers (2020-08-25T19:20:00Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.