Related papers: Error Bounds of the Invariant Statistics in Machine Learning of Ergodic It\^o Diffusions

Error Bounds of the Invariant Statistics in Machine Learning of Ergodic It\^o Diffusions

URL: http://arxiv.org/abs/2105.10102v2
Date: Mon, 24 May 2021 04:38:56 GMT
Title: Error Bounds of the Invariant Statistics in Machine Learning of Ergodic It\^o Diffusions
Authors: He Zhang, John Harlim, Xiantao Li
Abstract summary: We study the theoretical underpinnings of machine learning of ergodic Ito diffusions. We deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients.
Score: 8.627408356707525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the theoretical underpinnings of machine learning of ergodic It\^o diffusions. The objective is to understand the convergence properties of the invariant statistics when the underlying system of stochastic differential equations (SDEs) is empirically estimated with a supervised regression framework. Using the perturbation theory of ergodic Markov chains and the linear response theory, we deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients. More importantly, our study shows that the usual $L^2$-norm characterization of the learning generalization error is insufficient for achieving this linear dependence result. We find that sufficient conditions for such a linear dependence result are through learning algorithms that produce a uniformly Lipschitz and consistent estimator in the hypothesis space that retains certain characteristics of the drift coefficients, such as the usual linear growth condition that guarantees the existence of solutions of the underlying SDEs. We examine these conditions on two well-understood learning algorithms: the kernel-based spectral regression method and the shallow random neural networks with the ReLU activation function.

Related papers

Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains. We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z)
Evidential Physics-Informed Neural Networks [0.0]
We present a novel class of Physics-Informed Neural Networks that is formulated based on the principles of Evidential Deep Learning. We show how to apply our model to inverse problems involving 1D and 2D nonlinear differential equations.
arXiv Detail & Related papers (2025-01-27T10:01:10Z)
Asymptotics of Linear Regression with Linearly Dependent Data [28.005935031887038]
We study the computations of linear regression in settings with non-Gaussian covariates. We show how dependencies influence estimation error and the choice of regularization parameters.
arXiv Detail & Related papers (2024-12-04T20:31:47Z)
Modify Training Directions in Function Space to Reduce Generalization Error [9.821059922409091]
We propose a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory.
arXiv Detail & Related papers (2023-07-25T07:11:30Z)
Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations. DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z)
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations [114.17826109037048]
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning. theoretical aspects, e.g., identifiability and properties of statistical estimation are still obscure. This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a single trajectory.
arXiv Detail & Related papers (2022-10-12T06:46:38Z)
Amortized backward variational inference in nonlinear state-space models [0.0]
We consider the problem of state estimation in general state-space models using variational inference. We establish for the first time that, under mixing assumptions, the variational approximation of expectations of additive state functionals induces an error which grows at most linearly in the number of observations.
arXiv Detail & Related papers (2022-06-01T08:35:54Z)
Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features. We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula) [23.15629681360836]
We prove an analytical formula for the reconstruction performance of convex generalized linear models. We show that an analytical continuation may be carried out to extend the result to convex (non-strongly) problems. We illustrate our claim with numerical examples on mainstream learning methods.
arXiv Detail & Related papers (2020-06-11T16:26:35Z)
On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate. We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.