PAC Reinforcement Learning Algorithm for General-Sum Markov Games
- URL: http://arxiv.org/abs/2009.02605v1
- Date: Sat, 5 Sep 2020 21:54:27 GMT
- Title: PAC Reinforcement Learning Algorithm for General-Sum Markov Games
- Authors: Ashkan Zehfroosh and Herbert G. Tanner
- Abstract summary: The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games.
In addition to guiding the design of a provably PAC MARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC.
- Score: 5.279475826661642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a theoretical framework for probably approximately
correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov
games. The paper offers an extension to the well-known Nash Q-learning
algorithm, using the idea of delayed Q-learning, in order to build a new PAC
MARL algorithm for general-sum Markov games. In addition to guiding the design
of a provably PAC MARL algorithm, the framework enables checking whether an
arbitrary MARL algorithm is PAC. Comparative numerical results demonstrate
performance and robustness.
Related papers
- On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms.
LLMs have achieved remarkable empirical success.
Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - A Tutorial Introduction to Reinforcement Learning [1.9544213396776275]
We present a brief survey of Reinforcement Learning (RL), with particular emphasis on ApproximationSA as a unifying theme.
The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.
arXiv Detail & Related papers (2023-04-03T08:50:58Z) - Minimizing the Outage Probability in a Markov Decision Process [0.0]
We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value.
The proposed algorithm can be seen as an extension of the value iteration algorithm.
arXiv Detail & Related papers (2023-02-28T16:26:23Z) - Markov Abstractions for PAC Reinforcement Learning in Non-Markov
Decision Processes [90.53326983143644]
We show that Markov abstractions can be learned during reinforcement learning.
We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees.
arXiv Detail & Related papers (2022-04-29T16:53:00Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - A Sharp Analysis of Model-based Reinforcement Learning with Self-Play [49.88233710867315]
We present a sharp analysis of model-based self-play algorithms for multi-agent Markov games.
We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games.
We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games.
arXiv Detail & Related papers (2020-10-04T15:27:39Z) - A Hybrid PAC Reinforcement Learning Algorithm [5.279475826661642]
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs)
The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases.
arXiv Detail & Related papers (2020-09-05T21:32:42Z) - Learning LWF Chain Graphs: A Markov Blanket Discovery Approach [2.3333090554192615]
This paper provides a graphical characterization of Markov blankets in chain graphs (CGs) under the Lauritzen-Wermuth-Frydenberg (LWF) interpretation.
We provide a novel scalable and sound algorithm for Markov blanket discovery in LWF CGs.
arXiv Detail & Related papers (2020-05-29T16:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.