Related papers: Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control

Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control

URL: http://arxiv.org/abs/2301.00497v3
Date: Fri, 31 Mar 2023 16:29:36 GMT
Title: Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control
Authors: Hongyu Zhou, Zirui Xu, Vasileios Tzoumas
Abstract summary: We introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments.
Score: 15.588080817106563
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop a controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.

Related papers

Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Smoothed Online Learning for Prediction in Piecewise Affine Systems [43.64498536409903]
This paper builds on the recently developed smoothed online learning framework. It provides the first algorithms for prediction and simulation in piecewise affine systems.
arXiv Detail & Related papers (2023-01-26T15:54:14Z)
Learning to Control under Time-Varying Environment [18.48729114775298]
This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems. We propose the first computationally tractable online algorithm with regret guarantees.
arXiv Detail & Related papers (2022-06-06T11:40:46Z)
Online Control of Unknown Time-Varying Dynamical Systems [48.75672260851758]
We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model. We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
arXiv Detail & Related papers (2022-02-16T06:57:14Z)
Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z)
Contextual Inverse Optimization: Offline and Online Learning [3.6739949215165164]
We study the problems of offline and online contextual optimization with feedback information. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle.
arXiv Detail & Related papers (2021-06-26T13:09:52Z)
An Online Learning Approach to Optimizing Time-Varying Costs of AoI [26.661352924641285]
We consider systems that require timely monitoring of sources over a communication network. For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight. For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader.
arXiv Detail & Related papers (2021-05-27T18:10:56Z)
Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments. We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z)
Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning. We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)
Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.