Oracle-free Reinforcement Learning in Mean-Field Games along a Single
Sample Path
- URL: http://arxiv.org/abs/2208.11639v3
- Date: Tue, 11 Apr 2023 17:22:47 GMT
- Title: Oracle-free Reinforcement Learning in Mean-Field Games along a Single
Sample Path
- Authors: Muhammad Aneeq uz Zaman, Alec Koppel, Sujay Bhatt, Tamer Ba\c{s}ar
- Abstract summary: We consider online reinforcement learning in Mean-Field Games (MFGs)
We develop an algorithm that approximates the Mean-Field Equilibrium (MFE) using the single sample path of the generic agent.
We empirically demonstrate the effectiveness of the sandbox learning algorithm in diverse scenarios.
- Score: 5.926203312586109
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike
traditional approaches, we alleviate the need for a mean-field oracle by
developing an algorithm that approximates the Mean-Field Equilibrium (MFE)
using the single sample path of the generic agent. We call this {\it Sandbox
Learning}, as it can be used as a warm-start for any agent learning in a
multi-agent non-cooperative setting. We adopt a two time-scale approach in
which an online fixed-point recursion for the mean-field operates on a slower
time-scale, in tandem with a control policy update on a faster time-scale for
the generic agent. Given that the underlying Markov Decision Process (MDP) of
the agent is communicating, we provide finite sample convergence guarantees in
terms of convergence of the mean-field and control policy to the mean-field
equilibrium. The sample complexity of the Sandbox learning algorithm is
$\tilde{\mathcal{O}}(\epsilon^{-4})$ where $\epsilon$ is the MFE approximation
error. This is similar to works which assume access to oracle. Finally, we
empirically demonstrate the effectiveness of the sandbox learning algorithm in
diverse scenarios, including those where the MDP does not necessarily have a
single communicating class.
Related papers
- Finite-Time Analysis for Conflict-Avoidant Multi-Task Reinforcement Learning [21.288881065839007]
We develop a novel dynamic weighting multi-task actor-critic algorithm (MTAC) under two options of sub-procedures named as CA and FC.
MTAC-CA aims to find a conflict-avoidant (CA) update direction that maximizes the minimum value improvement among tasks, and MTAC-FC targets at a much faster convergence rate.
Our experiments on MT10 demonstrate the improved performance of our algorithms over existing MTRL methods with fixed preference.
arXiv Detail & Related papers (2024-05-25T05:57:46Z) - A Single Online Agent Can Efficiently Learn Mean Field Games [16.00164239349632]
Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems.
This paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples.
arXiv Detail & Related papers (2024-05-05T16:38:04Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Regularization of the policy updates for stabilizing Mean Field Games [0.2348805691644085]
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)
MARL where multiple agents interact in the same environment and whose goal is to maximize the individual returns.
We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.
arXiv Detail & Related papers (2023-04-04T05:45:42Z) - Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated
Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model.
Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with
Near-Optimal Sample Complexity and Communication Complexity [13.100926925535578]
We develop two decentralized TD with correction (TDC) algorithms for multi-agent off-policy TD learning.
Our algorithms preserve full privacy of the actions, policies and rewards of the agents, and adopt mini-batch sampling to reduce the sampling variance and communication frequency.
arXiv Detail & Related papers (2021-03-24T12:48:08Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.