Model-Free $\mu$ Synthesis via Adversarial Reinforcement Learning
- URL: http://arxiv.org/abs/2111.15537v1
- Date: Tue, 30 Nov 2021 16:29:49 GMT
- Title: Model-Free $\mu$ Synthesis via Adversarial Reinforcement Learning
- Authors: Darioush Keivan, Aaron Havens, Peter Seiler, Geir Dullerud, Bin Hu
- Abstract summary: We develop a model-free version of the well-known $DK$-iteration for solving state-feedback $mu$ synthesis with static $D$-scaling.
In the proposed algorithm, the $K$ step mimics the classical central path algorithm via a recently-developed double-loop adversarial RL method as a subroutine.
Our study sheds new light on the connections between adversarial RL and robust control.
- Score: 2.2725929250900947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the recent empirical success of policy-based reinforcement
learning (RL), there has been a research trend studying the performance of
policy-based RL methods on standard control benchmark problems. In this paper,
we examine the effectiveness of policy-based RL methods on an important robust
control problem, namely $\mu$ synthesis. We build a connection between robust
adversarial RL and $\mu$ synthesis, and develop a model-free version of the
well-known $DK$-iteration for solving state-feedback $\mu$ synthesis with
static $D$-scaling. In the proposed algorithm, the $K$ step mimics the
classical central path algorithm via incorporating a recently-developed
double-loop adversarial RL method as a subroutine, and the $D$ step is based on
model-free finite difference approximation. Extensive numerical study is also
presented to demonstrate the utility of our proposed model-free algorithm. Our
study sheds new light on the connections between adversarial RL and robust
control.
Related papers
- Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis [16.288866201806382]
We develop a model-free RLHF best policy identification algorithm, called $mathsfBSAD$, without explicit reward model inference.
The algorithm identifies the optimal policy directly from human preference information in a backward manner.
arXiv Detail & Related papers (2024-06-11T17:01:41Z) - Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data [16.995406965407003]
We propose a model-free algorithm called Robust $phi$-regularized fitted Q-iteration (RPQ) for learning an $epsilon$-optimal robust policy.
We also introduce the hybrid robust $phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling.
arXiv Detail & Related papers (2024-05-08T23:52:37Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided
Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs)
For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle.
Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.