Error Bounds of the Invariant Statistics in Machine Learning of Ergodic
It\^o Diffusions
- URL: http://arxiv.org/abs/2105.10102v2
- Date: Mon, 24 May 2021 04:38:56 GMT
- Title: Error Bounds of the Invariant Statistics in Machine Learning of Ergodic
It\^o Diffusions
- Authors: He Zhang, John Harlim, Xiantao Li
- Abstract summary: We study the theoretical underpinnings of machine learning of ergodic Ito diffusions.
We deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients.
- Score: 8.627408356707525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the theoretical underpinnings of machine learning of
ergodic It\^o diffusions. The objective is to understand the convergence
properties of the invariant statistics when the underlying system of stochastic
differential equations (SDEs) is empirically estimated with a supervised
regression framework. Using the perturbation theory of ergodic Markov chains
and the linear response theory, we deduce a linear dependence of the errors of
one-point and two-point invariant statistics on the error in the learning of
the drift and diffusion coefficients. More importantly, our study shows that
the usual $L^2$-norm characterization of the learning generalization error is
insufficient for achieving this linear dependence result. We find that
sufficient conditions for such a linear dependence result are through learning
algorithms that produce a uniformly Lipschitz and consistent estimator in the
hypothesis space that retains certain characteristics of the drift
coefficients, such as the usual linear growth condition that guarantees the
existence of solutions of the underlying SDEs. We examine these conditions on
two well-understood learning algorithms: the kernel-based spectral regression
method and the shallow random neural networks with the ReLU activation
function.
Related papers
- Modify Training Directions in Function Space to Reduce Generalization
Error [9.821059922409091]
We propose a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix.
We explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory.
arXiv Detail & Related papers (2023-07-25T07:11:30Z) - Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations [114.17826109037048]
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning.
theoretical aspects, e.g., identifiability and properties of statistical estimation are still obscure.
This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a single trajectory.
arXiv Detail & Related papers (2022-10-12T06:46:38Z) - Amortized backward variational inference in nonlinear state-space models [0.0]
We consider the problem of state estimation in general state-space models using variational inference.
We establish for the first time that, under mixing assumptions, the variational approximation of expectations of additive state functionals induces an error which grows at most linearly in the number of observations.
arXiv Detail & Related papers (2022-06-01T08:35:54Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Asymptotic Errors for Teacher-Student Convex Generalized Linear Models
(or : How to Prove Kabashima's Replica Formula) [23.15629681360836]
We prove an analytical formula for the reconstruction performance of convex generalized linear models.
We show that an analytical continuation may be carried out to extend the result to convex (non-strongly) problems.
We illustrate our claim with numerical examples on mainstream learning methods.
arXiv Detail & Related papers (2020-06-11T16:26:35Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.