Online learning with dynamics: A minimax perspective
- URL: http://arxiv.org/abs/2012.01705v1
- Date: Thu, 3 Dec 2020 05:06:08 GMT
- Title: Online learning with dynamics: A minimax perspective
- Authors: Kush Bhatia, Karthik Sridharan
- Abstract summary: We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds.
Our main results provide sufficient conditions for online learnability for this setup with corresponding rates.
- Score: 25.427783092065546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of online learning with dynamics, where a learner
interacts with a stateful environment over multiple rounds. In each round of
the interaction, the learner selects a policy to deploy and incurs a cost that
depends on both the chosen policy and current state of the world. The
state-evolution dynamics and the costs are allowed to be time-varying, in a
possibly adversarial way. In this setting, we study the problem of minimizing
policy regret and provide non-constructive upper bounds on the minimax rate for
the problem.
Our main results provide sufficient conditions for online learnability for
this setup with corresponding rates. The rates are characterized by 1) a
complexity term capturing the expressiveness of the underlying policy class
under the dynamics of state change, and 2) a dynamics stability term measuring
the deviation of the instantaneous loss from a certain counterfactual loss.
Further, we provide matching lower bounds which show that both the complexity
terms are indeed necessary.
Our approach provides a unifying analysis that recovers regret bounds for
several well studied problems including online learning with memory, online
control of linear quadratic regulators, online Markov decision processes, and
tracking adversarial targets. In addition, we show how our tools help obtain
tight regret bounds for a new problems (with non-linear dynamics and non-convex
losses) for which such bounds were not known prior to our work.
Related papers
- Active Reinforcement Learning Strategies for Offline Policy Improvement [8.2883946876766]
We propose an active reinforcement learning method capable of collecting trajectories that can augment existing offline data.
We demonstrate that our proposed method reduces additional online interaction with the environment by up to 75% over competitive baselines.
arXiv Detail & Related papers (2024-12-17T17:22:52Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Resilient Constrained Learning [94.27081585149836]
This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task.
We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation.
arXiv Detail & Related papers (2023-06-04T18:14:18Z) - Offline Stochastic Shortest Path: Learning, Evaluation and Towards
Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite.
We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks.
Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z) - Online Control of Unknown Time-Varying Dynamical Systems [48.75672260851758]
We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model.
We study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies.
arXiv Detail & Related papers (2022-02-16T06:57:14Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Reinforcement Learning Policies in Continuous-Time Linear Systems [0.0]
We present online policies that learn optimal actions fast by carefully randomizing the parameter estimates.
We prove sharp stability results for inexact system dynamics and tightly specify the infinitesimal regret caused by sub-optimal actions.
Our analysis sheds light on fundamental challenges in continuous-time reinforcement learning and suggests a useful cornerstone for similar problems.
arXiv Detail & Related papers (2021-09-16T00:08:50Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.