A Regret Minimization Approach to Iterative Learning Control
- URL: http://arxiv.org/abs/2102.13478v1
- Date: Fri, 26 Feb 2021 13:48:49 GMT
- Title: A Regret Minimization Approach to Iterative Learning Control
- Authors: Naman Agarwal, Elad Hazan, Anirudha Majumdar, Karan Singh
- Abstract summary: We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret.
We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
- Score: 61.37088759497583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the setting of iterative learning control, or model-based policy
learning in the presence of uncertain, time-varying dynamics. In this setting,
we propose a new performance metric, planning regret, which replaces the
standard stochastic uncertainty assumptions with worst case regret. Based on
recent advances in non-stochastic control, we design a new iterative algorithm
for minimizing planning regret that is more robust to model mismatch and
uncertainty. We provide theoretical and empirical evidence that the proposed
algorithm outperforms existing methods on several benchmarks.
Related papers
- Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian
Score Climbing [3.9410617513331863]
optimal control of dynamical systems is a crucial challenge in sequential decision-making.
Control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma.
This paper introduces a novel perspective by framing risk-sensitive control as Markovian reinforcement score climbing under samples drawn from a conditional particle filter.
arXiv Detail & Related papers (2023-12-21T16:34:03Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Guaranteed Trajectory Tracking under Learned Dynamics with Contraction Metrics and Disturbance Estimation [5.147919654191323]
This paper presents an approach to trajectory-centric learning control based on contraction metrics and disturbance estimation.
The proposed framework is validated on a planar quadrotor example.
arXiv Detail & Related papers (2021-12-15T15:57:33Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.