A Continuous-time Stochastic Gradient Descent Method for Continuous Data
- URL: http://arxiv.org/abs/2112.03754v1
- Date: Tue, 7 Dec 2021 15:09:24 GMT
- Title: A Continuous-time Stochastic Gradient Descent Method for Continuous Data
- Authors: Kexin Jin, Jonas Latz, Chenguang Liu, Carola-Bibiane Sch\"onlieb
- Abstract summary: We study a continuous-time variant of the gradient descent algorithm for optimization problems with continuous data.
We study multiple sampling patterns for the continuous data space and allow for data simulated or streamed at runtime.
We end with illustrating the applicability of the gradient process in a regression problem with noisy functional data, as well as in a physics-informed neural network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimization problems with continuous data appear in, e.g., robust machine
learning, functional data analysis, and variational inference. Here, the target
function is given as an integral over a family of (continuously) indexed target
functions - integrated with respect to a probability measure. Such problems can
often be solved by stochastic optimization methods: performing optimization
steps with respect to the indexed target function with randomly switched
indices. In this work, we study a continuous-time variant of the stochastic
gradient descent algorithm for optimization problems with continuous data. This
so-called stochastic gradient process consists in a gradient flow minimizing an
indexed target function that is coupled with a continuous-time index process
determining the index. Index processes are, e.g., reflected diffusions, pure
jump processes, or other L\'evy processes on compact spaces. Thus, we study
multiple sampling patterns for the continuous data space and allow for data
simulated or streamed at runtime of the algorithm. We analyze the approximation
properties of the stochastic gradient process and study its longtime behavior
and ergodicity under constant and decreasing learning rates. We end with
illustrating the applicability of the stochastic gradient process in a
polynomial regression problem with noisy functional data, as well as in a
physics-informed neural network.
Related papers
- Information Geometry and Beta Link for Optimizing Sparse Variational Student-t Processes [6.37512592611305]
Student-t Processes has been proposed to enhance computational efficiency and flexibility for real-world datasets using gradient descent.
Traditional gradient descent methods like Adam may not fully exploit the parameter space geometry, potentially leading to slower convergence and suboptimal performance.
We adopt natural gradient methods from information geometry for variational parameter optimization of Student-t Processes.
arXiv Detail & Related papers (2024-08-13T07:53:39Z) - Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent gradient (SGD) has the effect of smoothing objective function.
We analyze a new graduated optimization algorithm that varies the degree of smoothing by learning rate and batch size.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Stochastic Optimization under Distributional Drift [3.0229888038442922]
We provide non-asymptotic convergence guarantees for algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability.
We identify a low drift-to-noise regime in which the tracking efficiency of the gradient method benefits significantly from a step decay schedule.
arXiv Detail & Related papers (2021-08-16T21:57:39Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Stochastic Gradient Langevin with Delayed Gradients [29.6870062491741]
We show that the rate of convergence in measure is not significantly affected by the error caused by the delayed gradient information used for computation.
We show that the rate of convergence in measure is not significantly affected by the error caused by the delayed gradient information used for computation, suggesting significant potential for speedup in wall clock time.
arXiv Detail & Related papers (2020-06-12T17:51:30Z) - Analysis of Stochastic Gradient Descent in Continuous Time [0.0]
We introduce the gradient process as a continuous-time representation of gradient descent.
We show that it converges weakly to the gradient flow as the learning rate approaches zero.
In this case, the process converges weakly to the point mass concentrated in the global minimum of the full target function.
arXiv Detail & Related papers (2020-04-15T16:04:41Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.