Related papers: Online Prediction in Sub-linear Space

Online Prediction in Sub-linear Space

URL: http://arxiv.org/abs/2207.07974v1
Date: Sat, 16 Jul 2022 16:15:39 GMT
Title: Online Prediction in Sub-linear Space
Authors: Binghui Peng and Fred Zhang
Abstract summary: We provide the first sub-linear space and sub-linear regret algorithm for online learning with expert advice (against an oblivious adversary) We also demonstrate a separation between oblivious and (strong) adaptive adversaries by proving a linear memory lower bound of any sub-linear regret algorithm against an adaptive adversary.
Score: 15.773280101995676
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We provide the first sub-linear space and sub-linear regret algorithm for online learning with expert advice (against an oblivious adversary), addressing an open question raised recently by Srinivas, Woodruff, Xu and Zhou (STOC 2022). We also demonstrate a separation between oblivious and (strong) adaptive adversaries by proving a linear memory lower bound of any sub-linear regret algorithm against an adaptive adversary. Our algorithm is based on a novel pool selection procedure that bypasses the traditional wisdom of leader selection for online learning, and a generic reduction that transforms any weakly sub-linear regret $o(T)$ algorithm to $T^{1-\alpha}$ regret algorithm, which may be of independent interest. Our lower bound utilizes the connection of no-regret learning and equilibrium computation in zero-sum games, leading to a proof of a strong lower bound against an adaptive adversary.

Related papers

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information [57.287431079644705]
We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. We provide learning algorithms for the leader which achieve $O(T1/2)$ regret under bandit feedback.
arXiv Detail & Related papers (2025-01-31T22:40:57Z)
Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting. We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z)
Dynamic Regret for Strongly Adaptive Methods and Optimality of Online KRR [13.165557713537389]
We show that Strongly Adaptive (SA) algorithms can be viewed as a principled way of controlling dynamic regret. We derive a new lower bound on a certain penalized regret which establishes the near minimax optimality of online Kernel Ridge Regression (KRR)
arXiv Detail & Related papers (2021-11-22T21:52:47Z)
Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge [24.880035784304834]
We derive high probability regret bounds for online ridge regression and the forward algorithm. This enables us to compare online regression algorithms more accurately and eliminate assumptions of bounded observations and predictions.
arXiv Detail & Related papers (2021-11-02T13:57:53Z)
Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z)
Strongly Adaptive OCO with Memory [49.319621885036035]
We propose the first strongly adaptive algorithm for online learning with memory. Our algorithm results in a strongly adaptive regret bound for the control of linear time-varying systems.
arXiv Detail & Related papers (2021-02-02T17:26:08Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences. Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z)
Online Learning with Imperfect Hints [72.4277628722419]
We develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints. Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case.
arXiv Detail & Related papers (2020-02-11T23:06:09Z)
A Modern Introduction to Online Learning [15.974402990630402]
Online learning refers to the framework of minimization of regret under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses.
arXiv Detail & Related papers (2019-12-31T08:16:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.