Non-stationary Online Learning with Memory and Non-stochastic Control
- URL: http://arxiv.org/abs/2102.03758v4
- Date: Tue, 15 Aug 2023 02:31:59 GMT
- Title: Non-stationary Online Learning with Memory and Non-stochastic Control
- Authors: Peng Zhao and Yu-Hu Yan and Yu-Xiang Wang and Zhi-Hua Zhou
- Abstract summary: We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
- Score: 71.14503310914799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of Online Convex Optimization (OCO) with memory, which
allows loss functions to depend on past decisions and thus captures temporal
effects of learning problems. In this paper, we introduce dynamic policy regret
as the performance measure to design algorithms robust to non-stationary
environments, which competes algorithms' decisions with a sequence of changing
comparators. We propose a novel algorithm for OCO with memory that provably
enjoys an optimal dynamic policy regret in terms of time horizon,
non-stationarity measure, and memory length. The key technical challenge is how
to control the switching cost, the cumulative movements of player's decisions,
which is neatly addressed by a novel switching-cost-aware online ensemble
approach equipped with a new meta-base decomposition of dynamic policy regret
and a careful design of meta-learner and base-learner that explicitly
regularizes the switching cost. The results are further applied to tackle
non-stationarity in online non-stochastic control (Agarwal et al., 2019), i.e.,
controlling a linear dynamical system with adversarial disturbance and convex
cost functions. We derive a novel gradient-based controller with dynamic policy
regret guarantees, which is the first controller provably competitive to a
sequence of changing policies for online non-stochastic control.
Related papers
- Adaptive Online Non-stochastic Control [10.25772015681554]
We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment.
We tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs.
arXiv Detail & Related papers (2023-10-02T12:32:24Z) - Online Nonstochastic Model-Free Reinforcement Learning [35.377261344335736]
We investigate robust model robustness guarantees for environments that may be dynamic or adversarial.
We provide efficient and efficient algorithms for optimizing these policies.
These are the best-known developments in having no dependence on the state-space dimension in having no dependence on the state-space.
arXiv Detail & Related papers (2023-05-27T19:02:55Z) - Efficient Online Learning with Memory via Frank-Wolfe Optimization:
Algorithms with Bounded Dynamic Regret and Applications to Control [15.588080817106563]
We introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret.
We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments.
arXiv Detail & Related papers (2023-01-02T01:12:29Z) - Best of Both Worlds in Online Control: Competitive Ratio and Policy
Regret [61.59646565655169]
We show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight.
We conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown.
arXiv Detail & Related papers (2022-11-21T07:29:08Z) - Introduction to Online Nonstochastic Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.
The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Dynamic Regret of Online Markov Decision Processes [84.20723936192945]
We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions.
We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies.
We consider three foundational models of online MDPs, including episodic loop-free Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs.
arXiv Detail & Related papers (2022-08-26T07:42:53Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Learning High-Level Policies for Model Predictive Control [54.00297896763184]
Model Predictive Control (MPC) provides robust solutions to robot control tasks.
We propose a self-supervised learning algorithm for learning a neural network high-level policy.
We show that our approach can handle situations that are difficult for standard MPC.
arXiv Detail & Related papers (2020-07-20T17:12:34Z) - Learning Constrained Adaptive Differentiable Predictive Control Policies
With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems.
We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.