Classes of ODE solutions: smoothness, covering numbers, implications for
noisy function fitting, and the curse of smoothness phenomenon
- URL: http://arxiv.org/abs/2011.11371v3
- Date: Wed, 17 Mar 2021 23:48:37 GMT
- Title: Classes of ODE solutions: smoothness, covering numbers, implications for
noisy function fitting, and the curse of smoothness phenomenon
- Authors: Ying Zhu, Mozhgan Mirzaei
- Abstract summary: We show that the degree of smoothness and the "size" of a class of ODEs affect the "size" of the associated class of solutions.
Our results provide answers to "how do the degree of smoothness and the "size" of a class of ODEs affect the "size" of the associated class of solutions?"
- Score: 0.8376091455761261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many numerical methods for recovering ODE solutions from data rely on
approximating the solutions using basis functions or kernel functions under a
least square criterion. The accuracy of this approach hinges on the smoothness
of the solutions. This paper provides a theoretical foundation for these
methods by establishing novel results on the smoothness and covering numbers of
ODE solution classes (as a measure of their "size"). Our results provide
answers to "how do the degree of smoothness and the "size" of a class of ODEs
affect the "size" of the associated class of solutions?" We show that: (1) for
$y^{'}=f\left(y\right)$ and $y^{'}=f\left(x,\,y\right)$, if the absolute values
of all $k$th ($k\leq\beta+1$) order derivatives of $f$ are bounded by $1$, then
the solution can end up with the $(k+1)$th derivative whose magnitude grows
factorially fast in $k$ -- "a curse of smoothness"; (2) our upper bounds for
the covering numbers of the $(\beta+2)-$degree smooth solution classes are
greater than those of the "standard" $(\beta+2)-$degree smooth class of
univariate functions; (3) the mean squared error of least squares fitting for
noisy recovery has a convergence rate no larger than
$\left(\frac{1}{n}\right)^{\frac{2\left(\beta+2\right)}{2\left(\beta+2\right)+1}}$
if
$n=\Omega\left(\left(\beta\sqrt{\log\left(\beta\vee1\right)}\right)^{4\beta+10}\right)$,
and under this condition, the rate
$\left(\frac{1}{n}\right)^{\frac{2\left(\beta+2\right)}{2\left(\beta+2\right)+1}}$
is minimax optimal in the case of $y^{'}=f\left(x,\,y\right)$; (4) more
generally, for the higher order Picard type ODEs,
$y^{\left(m\right)}=f\left(x,\,y,\,y^{'},\,...,y^{\left(m-1\right)}\right)$,
the covering number of the solution class is bounded from above by the product
of the covering number of the class $\mathcal{F}$ that $f$ ranges over and the
covering number of the set where initial values lie.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space [10.292118864147097]
We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods.
Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $pi$ over $mathbbRd$ by a product measure $pistar$.
arXiv Detail & Related papers (2023-12-05T16:02:04Z) - Sharper Rates for Separable Minimax and Finite Sum Optimization via
Primal-Dual Extragradient Methods [39.87865197769047]
We study separable minimax optimization problems $min_x max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(Lx, mux)$, $(Ly, muy)$, and $h$ is convex-concave with a $(Lambdaxx, Lambdaxy, Lambdayymuyright)$.
arXiv Detail & Related papers (2022-02-09T18:57:47Z) - Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$.
We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z) - Infinite-Horizon Offline Reinforcement Learning with Linear Function
Approximation: Curse of Dimensionality and Algorithm [46.36534144138337]
In this paper, we investigate the sample complexity of policy evaluation in offline reinforcement learning.
Under the low distribution shift assumption, we show that there is an algorithm that needs at most $Oleft(maxleft fracleftVert thetapirightVert _24varepsilon4logfracddelta,frac1varepsilon2left(d+logfrac1deltaright)right right)$ samples to approximate the
arXiv Detail & Related papers (2021-03-17T18:18:57Z) - Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse
of Dimensionality on H\"older Class [6.476766717110237]
We construct neural networks with ReLU, sine and $2x$ as activation functions.
In addition to its supper expressive power, functions implemented by ReLU-sine-$2x$ networks are (generalized) differentiable.
arXiv Detail & Related papers (2021-02-28T15:57:42Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Improved Algorithms for Convex-Concave Minimax Optimization [10.28639483137346]
This paper studies minimax optimization problems $min_x max_y f(x,y)$, where $f(x,y)$ is $m_x$-strongly convex with respect to $x$, $m_y$-strongly concave with respect to $y$ and $(L_x,L_xy,L_y)$-smooth.
arXiv Detail & Related papers (2020-06-11T12:21:13Z) - Revisiting EXTRA for Smooth Distributed Optimization [70.65867695317633]
We give a sharp complexity analysis for EXTRA with the improved $Oleft(left(fracLmu+frac11-sigma_2(W)right)logfrac1epsilon (1-sigma_2(W))right)$.
Our communication complexities of the accelerated EXTRA are only worse by the factors of $left(logfracLmu (1-sigma_2(W))right)$ and $left(logfrac1epsilon (1-
arXiv Detail & Related papers (2020-02-24T08:07:08Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.