Related papers: PAC Reinforcement Learning Algorithm for General-Sum Markov Games

PAC Reinforcement Learning Algorithm for General-Sum Markov Games

URL: http://arxiv.org/abs/2009.02605v1
Date: Sat, 5 Sep 2020 21:54:27 GMT
Title: PAC Reinforcement Learning Algorithm for General-Sum Markov Games
Authors: Ashkan Zehfroosh and Herbert G. Tanner
Abstract summary: The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PAC MARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC.
Score: 5.279475826661642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PAC MARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC. Comparative numerical results demonstrate performance and robustness.

Related papers

Multi-Instance Partial-Label Learning with Margin Adjustment [52.05053599664198]
Multi-instance partial-label learning (MIPL) is an emerging learning framework where each training sample is represented as a multi-instance bag associated with a candidate label set. Existing MIPL algorithms often overlook the margins for attention scores and predicted probabilities, leading to suboptimal generalization performance. We propose an algorithm named MIPLMA, which adjusts the margins for attention scores and predicted probabilities.
arXiv Detail & Related papers (2025-01-22T02:47:17Z)
On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms. LLMs have achieved remarkable empirical success. Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS) For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
A Tutorial Introduction to Reinforcement Learning [1.9544213396776275]
We present a brief survey of Reinforcement Learning (RL), with particular emphasis on ApproximationSA as a unifying theme. The scope of the paper includes Markov Reward Processes, Markov Decision Processes, Approximation algorithms, and widely used algorithms such as Temporal Difference Learning and $Q$-learning.
arXiv Detail & Related papers (2023-04-03T08:50:58Z)
Minimizing the Outage Probability in a Markov Decision Process [0.0]
We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is greater than a given value. The proposed algorithm can be seen as an extension of the value iteration algorithm.
arXiv Detail & Related papers (2023-02-28T16:26:23Z)
Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes [90.53326983143644]
We show that Markov abstractions can be learned during reinforcement learning. We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees.
arXiv Detail & Related papers (2022-04-29T16:53:00Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play [49.88233710867315]
We present a sharp analysis of model-based self-play algorithms for multi-agent Markov games. We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games. We further adapt our analysis to designing a provably efficient task-agnostic algorithm for zero-sum Markov games.
arXiv Detail & Related papers (2020-10-04T15:27:39Z)
A Hybrid PAC Reinforcement Learning Algorithm [5.279475826661642]
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases.
arXiv Detail & Related papers (2020-09-05T21:32:42Z)
Learning LWF Chain Graphs: A Markov Blanket Discovery Approach [2.3333090554192615]
This paper provides a graphical characterization of Markov blankets in chain graphs (CGs) under the Lauritzen-Wermuth-Frydenberg (LWF) interpretation. We provide a novel scalable and sound algorithm for Markov blanket discovery in LWF CGs.
arXiv Detail & Related papers (2020-05-29T16:44:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.