Cooperative Online Learning in Stochastic and Adversarial MDPs
- URL: http://arxiv.org/abs/2201.13170v1
- Date: Mon, 31 Jan 2022 12:32:11 GMT
- Title: Cooperative Online Learning in Stochastic and Adversarial MDPs
- Authors: Tal Lancewicki and Aviv Rosenberg and Yishay Mansour
- Abstract summary: We study cooperative online learning in and adversarial Markov decision process (MDP)
In each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret.
We are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.
- Score: 50.62439652257712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study cooperative online learning in stochastic and adversarial Markov
decision process (MDP). That is, in each episode, $m$ agents interact with an
MDP simultaneously and share information in order to minimize their individual
regret. We consider environments with two types of randomness: \emph{fresh} --
where each agent's trajectory is sampled i.i.d, and \emph{non-fresh} -- where
the realization is shared by all agents (but each agent's trajectory is also
affected by its own actions). More precisely, with non-fresh randomness the
realization of every cost and transition is fixed at the start of each episode,
and agents that take the same action in the same state at the same time observe
the same cost and next state. We thoroughly analyze all relevant settings,
highlight the challenges and differences between the models, and prove
nearly-matching regret lower and upper bounds. To our knowledge, we are the
first to consider cooperative reinforcement learning (RL) with either non-fresh
randomness or in adversarial MDPs.
Related papers
- Causal Coordinated Concurrent Reinforcement Learning [8.654978787096807]
We propose a novel algorithmic framework for data sharing and coordinated exploration for the purpose of learning more data-efficient and better performing policies under a concurrent reinforcement learning setting.
Our algorithm leverages a causal inference algorithm in the form of Additive Noise Model - Mixture Model (ANM-MM) in extracting model parameters governing individual differentials via independence enforcement.
We propose a new data sharing scheme based on a similarity measure of the extracted model parameters and demonstrate superior learning speeds on a set of autoregressive, pendulum and cart-pole swing-up tasks.
arXiv Detail & Related papers (2024-01-31T17:20:28Z) - Stochastic Principal-Agent Problems: Efficient Computation and Learning [25.637633553882985]
A principal and an agent interact in a environment, each privy to observations about the state not available to the other.
The model encompasses as special cases extensive-form games (EFGs) and approaches games of Markov decision processes (POMDPs)
We show an efficient algorithm for an episodic reinforcement learning setting where transition probabilities are unknown.
arXiv Detail & Related papers (2023-06-06T16:20:44Z) - Multiagent Inverse Reinforcement Learning via Theory of Mind Reasoning [0.0]
We propose a novel approach to Multiagent Inverse Reinforcement Learning (MIRL)
MIRL aims to infer the reward functions guiding the behavior of each individual given trajectories of a team's behavior during task performance.
We evaluate our approach in a simulated 2-player search-and-rescue operation.
arXiv Detail & Related papers (2023-02-20T19:07:42Z) - Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity [44.2308932471393]
We show that exchanging model estimates leads to linear convergence speedups in the number of agents.
In a low-heterogeneity regime, exchanging model estimates leads to linear convergence speedups in the number of agents.
arXiv Detail & Related papers (2023-02-04T17:53:55Z) - Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes.
We name our method R-Mix following the concept of "Random Mix-up"
In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.