Robust Adversarial Reinforcement Learning via Bounded Rationality
Curricula
- URL: http://arxiv.org/abs/2311.01642v1
- Date: Fri, 3 Nov 2023 00:00:32 GMT
- Title: Robust Adversarial Reinforcement Learning via Bounded Rationality
Curricula
- Authors: Aryaman Reddi, Maximilian T\"olle, Jan Peters, Georgia Chalvatzaki,
Carlo D'Eramo
- Abstract summary: Adversarial Reinforcement Learning trains a protagonist against destabilizing forces exercised by an adversary in a competitive zero-sum Markov game.
Finding Nash equilibria requires facing complex saddle point optimization problems, which can be prohibitive to solve.
We propose a novel approach for adversarial RL based on entropy regularization to ease the complexity of the saddle point optimization problem.
- Score: 23.80052541774509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness against adversarial attacks and distribution shifts is a
long-standing goal of Reinforcement Learning (RL). To this end, Robust
Adversarial Reinforcement Learning (RARL) trains a protagonist against
destabilizing forces exercised by an adversary in a competitive zero-sum Markov
game, whose optimal solution, i.e., rational strategy, corresponds to a Nash
equilibrium. However, finding Nash equilibria requires facing complex saddle
point optimization problems, which can be prohibitive to solve, especially for
high-dimensional control. In this paper, we propose a novel approach for
adversarial RL based on entropy regularization to ease the complexity of the
saddle point optimization problem. We show that the solution of this
entropy-regularized problem corresponds to a Quantal Response Equilibrium
(QRE), a generalization of Nash equilibria that accounts for bounded
rationality, i.e., agents sometimes play random actions instead of optimal
ones. Crucially, the connection between the entropy-regularized objective and
QRE enables free modulation of the rationality of the agents by simply tuning
the temperature coefficient. We leverage this insight to propose our novel
algorithm, Quantal Adversarial RL (QARL), which gradually increases the
rationality of the adversary in a curriculum fashion until it is fully
rational, easing the complexity of the optimization problem while retaining
robustness. We provide extensive evidence of QARL outperforming RARL and recent
baselines across several MuJoCo locomotion and navigation problems in overall
performance and robustness.
Related papers
- Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods.
We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z) - Light Unbalanced Optimal Transport [69.18220206873772]
Existing solvers are either based on principles or heavy-weighted with complex optimization objectives involving several neural networks.
We show that our solver provides a universal approximation of UEOT solutions and obtains its generalization bounds.
arXiv Detail & Related papers (2023-03-14T15:44:40Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Robust Reinforcement Learning as a Stackelberg Game via
Adaptively-Regularized Adversarial Training [43.97565851415018]
Robust Reinforcement Learning (RL) focuses on improving performances under model errors or adversarial attacks.
Most of the existing literature models RARL as a zero-sum simultaneous game with Nash equilibrium as the solution concept.
We introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack.
arXiv Detail & Related papers (2022-02-19T03:44:05Z) - Mind Your Solver! On Adversarial Attack and Defense for Combinatorial
Optimization [111.78035414744045]
We take an initiative on developing the mechanisms for adversarial attack and defense towards optimization solvers.
We present a simple yet effective defense strategy to modify the graph structure to increase the robustness of solvers.
arXiv Detail & Related papers (2021-12-28T15:10:15Z) - Fast Policy Extragradient Methods for Competitive Games with Entropy
Regularization [40.21627891283402]
This paper investigates the problem of computing the equilibrium of competitive games.
Motivated by the algorithmic role of entropy regularization, we develop provably efficient extragradient methods.
arXiv Detail & Related papers (2021-05-31T17:51:15Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Robust Reinforcement Learning using Adversarial Populations [118.73193330231163]
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness.
We show that using a single adversary does not consistently yield robustness to dynamics variations under standard parametrizations of the adversary.
We propose a population-based augmentation to the Robust RL formulation in which we randomly initialize a population of adversaries and sample from the population uniformly during training.
arXiv Detail & Related papers (2020-08-04T20:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.