Related papers: Learning from time-dependent streaming data with online stochastic algorithms

Learning from time-dependent streaming data with online stochastic algorithms

URL: http://arxiv.org/abs/2205.12549v2
Date: Tue, 18 Jul 2023 20:01:10 GMT
Title: Learning from time-dependent streaming data with online stochastic algorithms
Authors: Antoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger
Abstract summary: This paper addresses optimization in a streaming setting with time-dependent and biased estimates. We analyze several first-order methods, including Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages.
Score: 7.283533791778357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses stochastic optimization in a streaming setting with time-dependent and biased gradient estimates. We analyze several first-order methods, including Stochastic Gradient Descent (SGD), mini-batch SGD, and time-varying mini-batch SGD, along with their Polyak-Ruppert averages. Our non-asymptotic analysis establishes novel heuristics that link dependence, biases, and convexity levels, enabling accelerated convergence. Specifically, our findings demonstrate that (i) time-varying mini-batch SGD methods have the capability to break long- and short-range dependence structures, (ii) biased SGD methods can achieve comparable performance to their unbiased counterparts, and (iii) incorporating Polyak-Ruppert averaging can accelerate the convergence of the stochastic optimization algorithms. To validate our theoretical findings, we conduct a series of experiments using both simulated and real-life time-dependent data.

Related papers

MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times [49.1574468325115]
We investigate the problem of minimizing the expectation of smooth non functions in a setting with multiple parallel workers that are able to compute optimal gradients.<n>A challenge in this context is the presence of arbitrarily heterogeneous and distributed compute times.<n>We introduce MindFlayer SGD, a novel parallel SGD method specifically designed to handle this gap.
arXiv Detail & Related papers (2024-10-05T21:11:32Z)
Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting [26.141054975797868]
We propose a novel Adaptive Multi-Scale Decomposition (AMD) framework for time series forecasting (TSF) Our framework decomposes time series into distinct temporal patterns at multiple scales, leveraging the Multi-Scale Decomposable Mixing (MDM) block. Our approach effectively models both temporal and channel dependencies and utilizes autocorrelation to refine multi-scale data integration.
arXiv Detail & Related papers (2024-06-06T05:27:33Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting [50.48888534815361]
We show that models trained with the Channel Independent (CI) strategy outperform those trained with the Channel Dependent (CD) strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. We propose a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy.
arXiv Detail & Related papers (2023-04-11T13:15:33Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data [0.0]
Streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially. We provide non-asymptotic convergence rates of various gradient-based algorithms. We show how to accelerate convergence by choosing the learning rate according to the time-varying mini-batches.
arXiv Detail & Related papers (2021-09-15T06:58:23Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms. Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z)
Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications. Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z)
Longitudinal Deep Kernel Gaussian Process Regression [16.618767289437905]
We introduce Longitudinal deep kernel process regression (L-DKGPR) L-DKGPR automates the discovery of complex multilevel correlation structure from longitudinal data. We derive an efficient algorithm to train L-DKGPR using latent space inducing points and variational inference.
arXiv Detail & Related papers (2020-05-24T15:10:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.