Group Distributionally Robust Reinforcement Learning with Hierarchical
Latent Variables
- URL: http://arxiv.org/abs/2210.12262v1
- Date: Fri, 21 Oct 2022 21:34:59 GMT
- Title: Group Distributionally Robust Reinforcement Learning with Hierarchical
Latent Variables
- Authors: Mengdi Xu, Peide Huang, Yaru Niu, Visak Kumar, Jielin Qiu, Chao Fang,
Kuan-Hui Lee, Xuewei Qi, Henry Lam, Bo Li, Ding Zhao
- Abstract summary: Group Distributionally Robust Markov Decision Process (GDR-MDP) is a flexible hierarchical MDP formulation that encodes task groups via a latent mixture model.
GDR-MDP identifies the optimal policy that maximizes the expected return under the worst-possible qualified belief over task groups.
We then develop deep RL algorithms for GDR-MDP for both value-based and policy-based RL methods.
- Score: 20.078557260741988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One key challenge for multi-task Reinforcement learning (RL) in practice is
the absence of task indicators. Robust RL has been applied to deal with task
ambiguity, but may result in over-conservative policies. To balance the
worst-case (robustness) and average performance, we propose Group
Distributionally Robust Markov Decision Process (GDR-MDP), a flexible
hierarchical MDP formulation that encodes task groups via a latent mixture
model. GDR-MDP identifies the optimal policy that maximizes the expected return
under the worst-possible qualified belief over task groups within an ambiguity
set. We rigorously show that GDR-MDP's hierarchical structure improves
distributional robustness by adding regularization to the worst possible
outcomes. We then develop deep RL algorithms for GDR-MDP for both value-based
and policy-based RL methods. Extensive experiments on Box2D control tasks,
MuJoCo benchmarks, and Google football platforms show that our algorithms
outperform classic robust training algorithms across diverse environments in
terms of robustness under belief uncertainties. Demos are available on our
project page (\url{https://sites.google.com/view/gdr-rl/home}).
Related papers
- Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.
Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.
We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - On Practical Robust Reinforcement Learning: Practical Uncertainty Set
and Double-Agent Algorithm [11.748284119769039]
Robust reinforcement learning (RRL) aims at seeking a robust policy to optimize the worst case performance over an uncertainty set of Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-05-11T08:52:09Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - Robustness and risk management via distributional dynamic programming [13.173307471333619]
We introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation.
Our approach reformulates through an augmented state space where each state is split into a worst-case substate and a best-case substate.
We derive distributional operators and DP algorithms solving a new control task.
arXiv Detail & Related papers (2021-12-28T12:12:57Z) - Twice regularized MDPs and the equivalence between robustness and
regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs.
We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.