Related papers: Classes of ODE solutions: smoothness, covering numbers, implications for noisy function fitting, and the curse of smoothness phenomenon

Classes of ODE solutions: smoothness, covering numbers, implications for noisy function fitting, and the curse of smoothness phenomenon

URL: http://arxiv.org/abs/2011.11371v3
Date: Wed, 17 Mar 2021 23:48:37 GMT
Title: Classes of ODE solutions: smoothness, covering numbers, implications for noisy function fitting, and the curse of smoothness phenomenon
Authors: Ying Zhu, Mozhgan Mirzaei
Abstract summary: We show that the degree of smoothness and the "size" of a class of ODEs affect the "size" of the associated class of solutions. Our results provide answers to "how do the degree of smoothness and the "size" of a class of ODEs affect the "size" of the associated class of solutions?"
Score: 0.8376091455761261
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many numerical methods for recovering ODE solutions from data rely on approximating the solutions using basis functions or kernel functions under a least square criterion. The accuracy of this approach hinges on the smoothness of the solutions. This paper provides a theoretical foundation for these methods by establishing novel results on the smoothness and covering numbers of ODE solution classes (as a measure of their "size"). Our results provide answers to "how do the degree of smoothness and the "size" of a class of ODEs affect the "size" of the associated class of solutions?" We show that: (1) for $y^{'}=f\left(y\right)$ and $y^{'}=f\left(x,\,y\right)$, if the absolute values of all $k$th ($k\leq\beta+1$) order derivatives of $f$ are bounded by $1$, then the solution can end up with the $(k+1)$th derivative whose magnitude grows factorially fast in $k$ -- "a curse of smoothness"; (2) our upper bounds for the covering numbers of the $(\beta+2)-$degree smooth solution classes are greater than those of the "standard" $(\beta+2)-$degree smooth class of univariate functions; (3) the mean squared error of least squares fitting for noisy recovery has a convergence rate no larger than $\left(\frac{1}{n}\right)^{\frac{2\left(\beta+2\right)}{2\left(\beta+2\right)+1}}$ if $n=\Omega\left(\left(\beta\sqrt{\log\left(\beta\vee1\right)}\right)^{4\beta+10}\right)$, and under this condition, the rate $\left(\frac{1}{n}\right)^{\frac{2\left(\beta+2\right)}{2\left(\beta+2\right)+1}}$ is minimax optimal in the case of $y^{'}=f\left(x,\,y\right)$; (4) more generally, for the higher order Picard type ODEs, $y^{\left(m\right)}=f\left(x,\,y,\,y^{'},\,...,y^{\left(m-1\right)}\right)$, the covering number of the solution class is bounded from above by the product of the covering number of the class $\mathcal{F}$ that $f$ ranges over and the covering number of the set where initial values lie.

Related papers

Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization. Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires. We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z)
Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space [10.292118864147097]
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbbRd$ by a product measure $pistar$.
arXiv Detail & Related papers (2023-12-05T16:02:04Z)
Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods [39.87865197769047]
We study separable minimax optimization problems $min_x max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(Lx, mux)$, $(Ly, muy)$, and $h$ is convex-concave with a $(Lambdaxx, Lambdaxy, Lambdayymuyright)$.
arXiv Detail & Related papers (2022-02-09T18:57:47Z)
Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$. We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z)
Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm [46.36534144138337]
In this paper, we investigate the sample complexity of policy evaluation in offline reinforcement learning. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $Oleft(maxleft fracleftVert thetapirightVert _24varepsilon4logfracddelta,frac1varepsilon2left(d+logfrac1deltaright)right right)$ samples to approximate the
arXiv Detail & Related papers (2021-03-17T18:18:57Z)
Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality on H\"older Class [6.476766717110237]
We construct neural networks with ReLU, sine and $2x$ as activation functions. In addition to its supper expressive power, functions implemented by ReLU-sine-$2x$ networks are (generalized) differentiable.
arXiv Detail & Related papers (2021-02-28T15:57:42Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
Improved Algorithms for Convex-Concave Minimax Optimization [10.28639483137346]
This paper studies minimax optimization problems $min_x max_y f(x,y)$, where $f(x,y)$ is $m_x$-strongly convex with respect to $x$, $m_y$-strongly concave with respect to $y$ and $(L_x,L_xy,L_y)$-smooth.
arXiv Detail & Related papers (2020-06-11T12:21:13Z)
Revisiting EXTRA for Smooth Distributed Optimization [70.65867695317633]
We give a sharp complexity analysis for EXTRA with the improved $Oleft(left(fracLmu+frac11-sigma_2(W)right)logfrac1epsilon (1-sigma_2(W))right)$. Our communication complexities of the accelerated EXTRA are only worse by the factors of $left(logfracLmu (1-sigma_2(W))right)$ and $left(logfrac1epsilon (1-
arXiv Detail & Related papers (2020-02-24T08:07:08Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems. We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.