Related papers: Distributed Q-Learning with State Tracking for Multi-agent Networked Control

Distributed Q-Learning with State Tracking for Multi-agent Networked Control

URL: http://arxiv.org/abs/2012.12383v1
Date: Tue, 22 Dec 2020 22:03:49 GMT
Title: Distributed Q-Learning with State Tracking for Multi-agent Networked Control
Authors: Hang Wang, Sen Lin, Hamid Jafarkhani, Junshan Zhang
Abstract summary: This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents.
Score: 61.63442612938345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case.

Related papers

The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond [44.43850105124659]
We consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. We provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning. We propose a novel federated Q-learning algorithm with importance averaging, giving larger weights to more frequently visited state-action pairs.
arXiv Detail & Related papers (2023-05-18T04:18:59Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Policy Evaluation in Decentralized POMDPs with Belief Sharing [39.550233049869036]
We consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. We propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network.
arXiv Detail & Related papers (2023-02-08T15:54:15Z)
Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z)
Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity. We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class. We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z)
Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation [87.85646257351212]
We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm. This problem is challenging when each robot considers its path without explicitly sharing observations with other robots. We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
arXiv Detail & Related papers (2021-12-16T16:47:00Z)
Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning [22.310861786709538]
We propose a scalable algorithm for cooperative multi-agent reinforcement learning. We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
arXiv Detail & Related papers (2021-09-23T23:38:15Z)
Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach [6.802025156985356]
This paper proposes a framework called localized training and decentralized execution to study MARL with network of states. The key idea is to utilize the homogeneity of agents and regroup them according to their states, thus the formulation of a networked Markov decision process.
arXiv Detail & Related papers (2021-08-05T16:52:36Z)
Multi-Agent Reinforcement Learning in Stochastic Networked Systems [30.78949372661673]
We study multi-agent reinforcement learning (MARL) in a network of agents. The objective is to find localized policies that maximize the (discounted) global reward.
arXiv Detail & Related papers (2020-06-11T16:08:16Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.