Learning from time-dependent streaming data with online stochastic
algorithms
- URL: http://arxiv.org/abs/2205.12549v2
- Date: Tue, 18 Jul 2023 20:01:10 GMT
- Title: Learning from time-dependent streaming data with online stochastic
algorithms
- Authors: Antoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger
- Abstract summary: This paper addresses optimization in a streaming setting with time-dependent and biased estimates.
We analyze several first-order methods, including Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages.
- Score: 7.283533791778357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses stochastic optimization in a streaming setting with
time-dependent and biased gradient estimates. We analyze several first-order
methods, including Stochastic Gradient Descent (SGD), mini-batch SGD, and
time-varying mini-batch SGD, along with their Polyak-Ruppert averages. Our
non-asymptotic analysis establishes novel heuristics that link dependence,
biases, and convexity levels, enabling accelerated convergence. Specifically,
our findings demonstrate that (i) time-varying mini-batch SGD methods have the
capability to break long- and short-range dependence structures, (ii) biased
SGD methods can achieve comparable performance to their unbiased counterparts,
and (iii) incorporating Polyak-Ruppert averaging can accelerate the convergence
of the stochastic optimization algorithms. To validate our theoretical
findings, we conduct a series of experiments using both simulated and real-life
time-dependent data.
Related papers
- Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting [26.141054975797868]
We propose a novel Adaptive Multi-Scale Decomposition (AMD) framework for time series forecasting (TSF)
Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block.
Our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration.
arXiv Detail & Related papers (2024-06-06T05:27:33Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - The Capacity and Robustness Trade-off: Revisiting the Channel
Independent Strategy for Multivariate Time Series Forecasting [50.48888534815361]
We show that models trained with the Channel Independent (CI) strategy outperform those trained with the Channel Dependent (CD) strategy.
Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series.
We propose a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy.
arXiv Detail & Related papers (2023-04-11T13:15:33Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Non-Asymptotic Analysis of Stochastic Approximation Algorithms for
Streaming Data [0.0]
Streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially.
We provide non-asymptotic convergence rates of various gradient-based algorithms.
We show how to accelerate convergence by choosing the learning rate according to the time-varying mini-batches.
arXiv Detail & Related papers (2021-09-15T06:58:23Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Fast and Robust Online Inference with Stochastic Gradient Descent via
Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms.
Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Longitudinal Deep Kernel Gaussian Process Regression [16.618767289437905]
We introduce Longitudinal deep kernel process regression (L-DKGPR)
L-DKGPR automates the discovery of complex multilevel correlation structure from longitudinal data.
We derive an efficient algorithm to train L-DKGPR using latent space inducing points and variational inference.
arXiv Detail & Related papers (2020-05-24T15:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.