Feasible Adversarial Robust Reinforcement Learning for Underspecified
Environments
- URL: http://arxiv.org/abs/2207.09597v1
- Date: Tue, 19 Jul 2022 23:57:51 GMT
- Title: Feasible Adversarial Robust Reinforcement Learning for Underspecified
Environments
- Authors: JB Lanier, Stephen McAleer, Pierre Baldi, Roy Fox
- Abstract summary: In real-world environments, choosing the set of possible values for robust reinforcement learning can be a difficult task.
We propose Feasible Adversarial Robust RL (FARR), a method for automatically determining the set of environment parameter values over which to be robust.
Using the PSRO algorithm to find an approximate Nash equilibrium in this FARR game, we show that an agent trained with FARR is more robust to feasible adversarial parameter selection than with existing minimax, domain-randomization, and regret objectives.
- Score: 11.866835246140647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust reinforcement learning (RL) considers the problem of learning policies
that perform well in the worst case among a set of possible environment
parameter values. In real-world environments, choosing the set of possible
values for robust RL can be a difficult task. When that set is specified too
narrowly, the agent will be left vulnerable to reasonable parameter values
unaccounted for. When specified too broadly, the agent will be too cautious. In
this paper, we propose Feasible Adversarial Robust RL (FARR), a method for
automatically determining the set of environment parameter values over which to
be robust. FARR implicitly defines the set of feasible parameter values as
those on which an agent could achieve a benchmark reward given enough training
resources. By formulating this problem as a two-player zero-sum game, FARR
jointly learns an adversarial distribution over parameter values with feasible
support and a policy robust over this feasible parameter set. Using the PSRO
algorithm to find an approximate Nash equilibrium in this FARR game, we show
that an agent trained with FARR is more robust to feasible adversarial
parameter selection than with existing minimax, domain-randomization, and
regret objectives in a parameterized gridworld and three MuJoCo control
environments.
Related papers
- Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters.
We learn and analyse IMDPs for a set of unknown sample environments induced by parameters.
We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z) - Wasserstein Distributionally Robust Policy Evaluation and Learning for
Contextual Bandits [18.982448033389588]
Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment.
To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed.
We propose a novel DRO approach that employs the Wasserstein distance instead.
arXiv Detail & Related papers (2023-09-15T20:21:46Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - Grounding Aleatoric Uncertainty in Unsupervised Environment Design [32.00797965770773]
In partially-observable settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment.
We propose a minimax regret UED method that optimize the ground-truth utility function, even when the underlying training data is biased due to CICS.
arXiv Detail & Related papers (2022-07-11T22:45:29Z) - Robust Reinforcement Learning in Continuous Control Tasks with
Uncertainty Set Regularization [17.322284328945194]
Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations.
We propose a new regularizer named $textbfU$ncertainty $textbfS$et $textbfR$egularizer (USR)
arXiv Detail & Related papers (2022-07-05T12:56:08Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Uncertainty Aware System Identification with Universal Policies [45.44896435487879]
Sim2real transfer is concerned with transferring policies trained in simulation to potentially noisy real world environments.
We propose Uncertainty-aware policy search (UncAPS), where we use Universal Policy Network (UPN) to store simulation-trained task-specific policies.
We then employ robust Bayesian optimisation to craft robust policies for the given environment by combining relevant UPN policies in a DR like fashion.
arXiv Detail & Related papers (2022-02-11T18:27:23Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Kalman meets Bellman: Improving Policy Evaluation through Value Tracking [59.691919635037216]
Policy evaluation is a key process in Reinforcement Learning (RL)
We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA)
KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.
arXiv Detail & Related papers (2020-02-17T13:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.