Sobolev Acceleration and Statistical Optimality for Learning Elliptic
Equations via Gradient Descent
- URL: http://arxiv.org/abs/2205.07331v2
- Date: Tue, 17 May 2022 04:21:52 GMT
- Title: Sobolev Acceleration and Statistical Optimality for Learning Elliptic
Equations via Gradient Descent
- Authors: Yiping Lu, Jose Blanchet, Lexing Ying
- Abstract summary: We study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations.
Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN)
- Score: 11.483919798541393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the statistical limits in terms of Sobolev norms of
gradient descent for solving inverse problem from randomly sampled noisy
observations using a general class of objective functions. Our class of
objective functions includes Sobolev training for kernel regression, Deep Ritz
Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic
partial differential equations (PDEs) as special cases. We consider a
potentially infinite-dimensional parameterization of our model using a suitable
Reproducing Kernel Hilbert Space and a continuous parameterization of problem
hardness through the definition of kernel integral operators. We prove that
gradient descent over this objective function can also achieve statistical
optimality and the optimal number of passes over the data increases with sample
size. Based on our theory, we explain an implicit acceleration of using a
Sobolev norm as the objective function for training, inferring that the optimal
number of epochs of DRM becomes larger than the number of PINN when both the
data size and the hardness of tasks increase, although both DRM and PINN can
achieve statistical optimality.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Semi-Supervised Deep Sobolev Regression: Estimation, Variable Selection
and Beyond [3.782392436834913]
We propose SDORE, a semi-supervised deep Sobolev regressor, for the nonparametric estimation of the underlying regression function and its gradient.
We conduct a comprehensive analysis of the convergence rates of SDORE and establish a minimax optimal rate for the regression function.
We also derive a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift.
arXiv Detail & Related papers (2024-01-09T13:10:30Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood
Estimation [2.53740603524637]
We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation procedure.
In particular, we prove that the parameter marginal of the stationary measure of this diffusion has the form of a Gibbs measure.
Using a particular rescaling, we then prove geometric ergodicity of this system and bound the discretisation error.
in a manner that is uniform in time and does not increase with the number of particles.
arXiv Detail & Related papers (2023-03-23T16:50:08Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Rigorous dynamical mean field theory for stochastic gradient descent
methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.