A note on continuous-time online learning
- URL: http://arxiv.org/abs/2405.10399v1
- Date: Thu, 16 May 2024 18:58:19 GMT
- Title: A note on continuous-time online learning
- Authors: Lexing Ying,
- Abstract summary: In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets.
This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit.
- Score: 13.796981813494199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.
Related papers
- Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Efficient Online Learning with Memory via Frank-Wolfe Optimization:
Algorithms with Bounded Dynamic Regret and Applications to Control [15.588080817106563]
We introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret.
We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments.
arXiv Detail & Related papers (2023-01-02T01:12:29Z) - Learning-Assisted Algorithm Unrolling for Online Optimization with
Budget Constraints [27.84415856657607]
We propose a new machine learning (ML) assisted unrolling approach, called LAAU (Learning-Assisted Algorithm Unrolling)
For efficient training via backpropagation, we derive gradients of the decision pipeline over time.
We also provide the average cost bounds for two cases when training data is available offline and collected online, respectively.
arXiv Detail & Related papers (2022-12-03T20:56:29Z) - Continuous Prediction with Experts' Advice [10.98975673892221]
Prediction with experts' advice is one of the most fundamental problems in online learning.
Recent work has looked at online learning through the lens of differential equations and continuous-time analysis.
arXiv Detail & Related papers (2022-06-01T05:09:20Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - An Online Learning Approach to Optimizing Time-Varying Costs of AoI [26.661352924641285]
We consider systems that require timely monitoring of sources over a communication network.
For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight.
For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader.
arXiv Detail & Related papers (2021-05-27T18:10:56Z) - Strongly Adaptive OCO with Memory [49.319621885036035]
We propose the first strongly adaptive algorithm for online learning with memory.
Our algorithm results in a strongly adaptive regret bound for the control of linear time-varying systems.
arXiv Detail & Related papers (2021-02-02T17:26:08Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z) - A Modern Introduction to Online Learning [15.974402990630402]
Online learning refers to the framework of minimization of regret under worst-case assumptions.
I present first-order and second-order algorithms for online learning with convex losses.
arXiv Detail & Related papers (2019-12-31T08:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.