Error Estimates for the Variational Training of Neural Networks with
Boundary Penalty
- URL: http://arxiv.org/abs/2103.01007v1
- Date: Mon, 1 Mar 2021 13:55:59 GMT
- Title: Error Estimates for the Variational Training of Neural Networks with
Boundary Penalty
- Authors: Johannes M\"uller, Marius Zeinhofer
- Abstract summary: We establish estimates on the error made by the Ritz method for quadratic energies on the space $H1(Omega)$.
Special attention is paid to the case of Dirichlet boundary values which are treated with the boundary penalty method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We establish estimates on the error made by the Ritz method for quadratic
energies on the space $H^1(\Omega)$ in the approximation of the solution of
variational problems with different boundary conditions. Special attention is
paid to the case of Dirichlet boundary values which are treated with the
boundary penalty method. We consider arbitrary and in general non linear
classes $V\subseteq H^1(\Omega)$ of ansatz functions and estimate the error in
dependence of the optimisation accuracy, the approximation capabilities of the
ansatz class and - in the case of Dirichlet boundary values - the penalisation
strength $\lambda$. For non-essential boundary conditions the error of the Ritz
method decays with the same rate as the approximation rate of the ansatz
classes. For the boundary penalty method we obtain that given an approximation
rate of $r$ in $H^1(\Omega)$ and an approximation rate of $s$ in
$L^2(\partial\Omega)$ of the ansatz classes, the optimal decay rate of the
estimated error is $\min(s/2, r) \in [r/2, r]$ and achieved by choosing
$\lambda_n\sim n^{s}$. We discuss how this rate can be improved, the relation
to existing estimates for finite element functions as well as the implications
for ansatz classes which are given through ReLU networks. Finally, we use the
notion of $\Gamma$-convergence to show that the Ritz method converges for a
wide class of energies including nonlinear stationary PDEs like the
$p$-Laplace.
Related papers
- Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees [1.2562458634975162]
Existing methods typically aim to find an $epsilon$-stochastic stationary point, where the expected violations of both constraints and first-order stationarity are within a prescribed accuracy.
In many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $epsilon$-stochastic stationary point potentially undesirable.
arXiv Detail & Related papers (2024-09-16T00:26:42Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Rates of Convergence in Certain Native Spaces of Approximations used in
Reinforcement Learning [0.0]
This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(Omega)$.
Explicit upper bounds on error in value function and controller approximations are derived in terms of power function $mathcalP_H,N$ for the space of finite dimensional approximants $H_N$ in the native space $H(Omega)$.
arXiv Detail & Related papers (2023-09-14T02:02:08Z) - Optimal Approximation of Zonoids and Uniform Approximation by Shallow
Neural Networks [2.7195102129095003]
We study the following two related problems.
The first is to determine what error an arbitrary zonoid in $mathbbRd+1$ can be approximated in the Hausdorff distance by a sum of $n$ line segments.
The second is to determine optimal approximation rates in the uniform norm for shallow ReLU$k$ neural networks on their variation spaces.
arXiv Detail & Related papers (2023-07-28T03:43:17Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.