Approximation Bounds for Recurrent Neural Networks with Application to Regression
- URL: http://arxiv.org/abs/2409.05577v1
- Date: Mon, 9 Sep 2024 13:02:50 GMT
- Title: Approximation Bounds for Recurrent Neural Networks with Application to Regression
- Authors: Yuling Jiao, Yang Wang, Bokai Yan,
- Abstract summary: We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs.
We derive upper bounds on the approximation error of RNNs for H"older smooth functions.
Our results provide statistical guarantees on the performance of RNNs.
- Score: 7.723218675113336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs. We derive upper bounds on the approximation error of RNNs for H\"older smooth functions, in the sense that the output at each time step of an RNN can approximate a H\"older function that depends only on past and current information, termed a past-dependent function. This allows a carefully constructed RNN to simultaneously approximate a sequence of past-dependent H\"older functions. We apply these approximation results to derive non-asymptotic upper bounds for the prediction error of the empirical risk minimizer in regression problem. Our error bounds achieve minimax optimal rate under both exponentially $\beta$-mixing and i.i.d. data assumptions, improving upon existing ones. Our results provide statistical guarantees on the performance of RNNs.
Related papers
- Automatic debiasing of neural networks via moment-constrained learning [0.0]
Naively learning the regression function and taking a sample mean of the target functional results in biased estimators.
We propose moment-constrained learning as a new RR learning approach that addresses some shortcomings in automatic debiasing.
arXiv Detail & Related papers (2024-09-29T20:56:54Z) - Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality [0.0]
We consider sparse-penalized regularization for deep neural network predictor.
We deal with the squared and a broad class of loss functions.
arXiv Detail & Related papers (2024-06-12T15:21:51Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences [0.0]
We characterize the tangents of recurrent neural networks (RNNs) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity.
These methods give rise to the neural kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
arXiv Detail & Related papers (2023-08-28T13:17:39Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - DNNR: Differential Nearest Neighbors Regression [8.667550264279166]
K-nearest neighbors (KNN) is one of the earliest and most established algorithms in machine learning.
For regression tasks, KNN averages the targets within a neighborhood which poses a number of challenges.
We propose Differential Nearest Neighbors Regression (DNNR) that addresses both issues simultaneously.
arXiv Detail & Related papers (2022-05-17T15:22:53Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Spectral Pruning for Recurrent Neural Networks [0.0]
Pruning techniques for neural networks with a recurrent architecture, such as the recurrent neural network (RNN), are strongly desired for their application to edge-computing devices.
In this paper, we propose an appropriate pruning algorithm for RNNs inspired by "spectral pruning", and provide the generalization error bounds for compressed RNNs.
arXiv Detail & Related papers (2021-05-23T00:30:59Z) - An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their
Asymptotic Overconfidence [65.24701908364383]
A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data.
But far away from them, ReLU neural networks (BNNs) can still underestimate uncertainty and thus be overconfident.
We show that it can be applied emphpost-hoc to any pre-trained ReLU BNN at a low cost.
arXiv Detail & Related papers (2020-10-06T13:32:18Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.