Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale
- URL: http://arxiv.org/abs/2403.00222v3
- Date: Tue, 22 Oct 2024 19:16:39 GMT
- Title: Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale
- Authors: Emile Anand, Guannan Qu,
- Abstract summary: We study reinforcement learning for global decision-making in the presence of local agents.
In this setting, scalability has been a long-standing challenge due to the size of the state space.
We show that this learned policy converges to the optimal policy in the order of $tildeO (1/sqrtk+epsilon_k,m)$ as the number of sub-sampled agents increases.
- Score: 5.3526997662068085
- License:
- Abstract: We study reinforcement learning for global decision-making in the presence of local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the joint rewards of all the agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has been a long-standing challenge due to the size of the state space which can be exponential in the number of agents. This work proposes the \texttt{SUBSAMPLE-Q} algorithm where the global agent subsamples $k\leq n$ local agents to compute a policy in time that is polynomial in $k$. We show that this learned policy converges to the optimal policy in the order of $\tilde{O}(1/\sqrt{k}+{\epsilon}_{k,m})$ as the number of sub-sampled agents $k$ increases, where ${\epsilon}_{k,m}$ is the Bellman noise. Finally, we validate the theory through numerical simulations in a demand-response setting and a queueing setting.
Related papers
- Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with
General Utilities [12.104551746465932]
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints.
Our algorithm converges to a first-order stationary point (FOSP) at the rate of $mathcalOleft(T-2/3right)$.
In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $widetildemathcalOleft(epsilon-3.5right)$ samples to achieve an $epsilon$-FOSP.
arXiv Detail & Related papers (2023-05-27T20:08:35Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Scalable Multi-Agent Reinforcement Learning with General Utilities [30.960413388976438]
We study the scalable multi-agent reinforcement learning (MARL) with general utilities.
The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team.
This is the first result in the literature on multi-agent RL with general utilities that does not require the full observability.
arXiv Detail & Related papers (2023-02-15T20:47:43Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z) - Global Convergence of Localized Policy Iteration in Networked
Multi-Agent Reinforcement Learning [25.747559058350557]
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network.
The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards.
To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) that provably learns a near-globally-optimal policy using only local information.
arXiv Detail & Related papers (2022-11-30T15:58:00Z) - Federated Stochastic Approximation under Markov Noise and Heterogeneity: Applications in Reinforcement Learning [24.567125948995834]
Federated reinforcement learning is a framework in which $N$ agents collaboratively learn a global model.
We show that by careful collaboration of the agents in solving this joint fixed point problem, we can find the global model $N$ times faster.
arXiv Detail & Related papers (2022-06-21T08:39:12Z) - Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret
Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.
We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS)
DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z) - Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms [0.6961253535504979]
We present sufficient conditions that ensure convergence of the multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm.
It is an example of one of the most popular paradigms of Deep Reinforcement Learning (DeepRL) for tackling continuous action spaces: the actor-critic paradigm.
arXiv Detail & Related papers (2022-01-03T10:33:52Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Distributed Q-Learning with State Tracking for Multi-agent Networked
Control [61.63442612938345]
This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network.
We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents.
arXiv Detail & Related papers (2020-12-22T22:03:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.