Explore Reinforced: Equilibrium Approximation with Reinforcement Learning
- URL: http://arxiv.org/abs/2412.02016v1
- Date: Mon, 02 Dec 2024 22:37:59 GMT
- Title: Explore Reinforced: Equilibrium Approximation with Reinforcement Learning
- Authors: Ryan Yu, Mateusz Nowak, Qintong Xie, Michelle Yilin Feng, Peter Chin,
- Abstract summary: We introduce Exp3-IXrl - a blend of RL and game-theoretic approach, separating the RL agent's action selection from the equilibrium.
We demonstrate that our algorithm expands the application of equilibrium approximation algorithms to new environments.
- Score: 3.214961078500366
- License:
- Abstract: Current approximate Coarse Correlated Equilibria (CCE) algorithms struggle with equilibrium approximation for games in large stochastic environments but are theoretically guaranteed to converge to a strong solution concept. In contrast, modern Reinforcement Learning (RL) algorithms provide faster training yet yield weaker solutions. We introduce Exp3-IXrl - a blend of RL and game-theoretic approach, separating the RL agent's action selection from the equilibrium computation while preserving the integrity of the learning process. We demonstrate that our algorithm expands the application of equilibrium approximation algorithms to new environments. Specifically, we show the improved performance in a complex and adversarial cybersecurity network environment - the Cyber Operations Research Gym - and in the classical multi-armed bandit settings.
Related papers
- Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural
Equilibrium Solvers [22.85979978964773]
Solution concepts such as Nash Equilibria, Correlated Equilibria, and Coarse Correlated Equilibria are useful components for many multiagent machine learning algorithms.
We introduce the Neural Equilibrium solver which utilizes a special equivariant neural network architecture to approximately solve the space of all games of fixed shape, buying speed and determinism.
arXiv Detail & Related papers (2022-10-17T17:00:31Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers [21.462231105582347]
We propose an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium.
We also suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Gini Correlated Equilibrium (MGCE)
We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.
arXiv Detail & Related papers (2021-06-17T12:34:18Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.