Asynchronous Decentralized Q-Learning: Two Timescale Analysis By
Persistence
- URL: http://arxiv.org/abs/2308.03239v1
- Date: Mon, 7 Aug 2023 01:32:09 GMT
- Title: Asynchronous Decentralized Q-Learning: Two Timescale Analysis By
Persistence
- Authors: Bora Yongacoglu and G\"urdal Arslan and Serdar Y\"uksel
- Abstract summary: Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn.
Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways.
Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchrony is infeasible in many decentralized applications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-stationarity is a fundamental challenge in multi-agent reinforcement
learning (MARL), where agents update their behaviour as they learn. Many
theoretical advances in MARL avoid the challenge of non-stationarity by
coordinating the policy updates of agents in various ways, including
synchronizing times at which agents are allowed to revise their policies.
Synchronization enables analysis of many MARL algorithms via multi-timescale
methods, but such synchrony is infeasible in many decentralized applications.
In this paper, we study an asynchronous variant of the decentralized Q-learning
algorithm, a recent MARL algorithm for stochastic games. We provide sufficient
conditions under which the asynchronous algorithm drives play to equilibrium
with high probability. Our solution utilizes constant learning rates in the
Q-factor update, which we show to be critical for relaxing the synchrony
assumptions of earlier work. Our analysis also applies to asynchronous
generalizations of a number of other algorithms from the regret testing
tradition, whose performance is analyzed by multi-timescale methods that study
Markov chains obtained via policy update dynamics. This work extends the
applicability of the decentralized Q-learning algorithm and its relatives to
settings in which parameters are selected in an independent manner, and tames
non-stationarity without imposing the coordination assumptions of prior work.
Related papers
- Online Statistical Inference for Time-varying Sample-averaged Q-learning [2.2374171443798034]
This paper introduces a time-varying batch-averaged Q-learning, termed sampleaveraged Q-learning.
We develop a novel framework that provides insights into the normality of the sample-averaged algorithm under mild conditions.
Numerical experiments conducted on classic OpenAI Gym environments show that the time-varying sample-averaged Q-learning method consistently outperforms both single-sample and constant-batch Q-learning.
arXiv Detail & Related papers (2024-10-14T17:17:19Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability [4.111899441919164]
State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems.
We first propose a group of value-based RL approaches for MacDec-POMDPs.
We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
arXiv Detail & Related papers (2022-09-20T21:13:51Z) - Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning [19.540926205375857]
Synchronizing decisions across multiple agents in realistic settings is problematic since it requires agents to wait for other agents to terminate and communicate about termination reliably.
We formulate a set of asynchronous multi-agent actor-critic methods that allow agents to directly optimize asynchronous policies in three standard training paradigms.
arXiv Detail & Related papers (2022-09-20T16:36:23Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games [3.441021278275805]
Learning in games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL)
We establish the finite-sample complexity of fully decentralized Q-learning algorithms in a significant class of general approximation games (SGs)
We focus on the practical while challenging setting of fully decentralized MARL, where neither the rewards nor the actions of other agents can be observed by each agent.
arXiv Detail & Related papers (2021-12-15T03:33:39Z) - Coding for Distributed Multi-Agent Reinforcement Learning [12.366967700730449]
Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances.
We propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers.
Different coding schemes, including maximum distance separable (MDS)code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated.
arXiv Detail & Related papers (2021-01-07T00:22:34Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.