Communication-Efficient Collaborative Best Arm Identification
- URL: http://arxiv.org/abs/2208.09029v1
- Date: Thu, 18 Aug 2022 19:02:29 GMT
- Title: Communication-Efficient Collaborative Best Arm Identification
- Authors: Nikolai Karpov and Qin Zhang
- Abstract summary: We investigate top-$m$ arm identification, a basic problem in bandit theory, in a multi-agent learning model in which agents collaborate to learn an objective function.
We are interested in designing collaborative learning algorithms that achieve maximum speedup.
- Score: 6.861971769602314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate top-$m$ arm identification, a basic problem in bandit theory,
in a multi-agent learning model in which agents collaborate to learn an
objective function. We are interested in designing collaborative learning
algorithms that achieve maximum speedup (compared to single-agent learning
algorithms) using minimum communication cost, as communication is frequently
the bottleneck in multi-agent learning. We give both algorithmic and
impossibility results, and conduct a set of experiments to demonstrate the
effectiveness of our algorithms.
Related papers
- Multi-Agent Best Arm Identification in Stochastic Linear Bandits [0.7673339435080443]
We study the problem of collaborative best-arm identification in linear bandits under a fixed-budget scenario.
In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel.
We devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively.
arXiv Detail & Related papers (2024-11-20T20:09:44Z) - Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Scaling Large-Language-Model-based Multi-Agent Collaboration [75.5241464256688]
Pioneering advancements in large language model-powered agents have underscored the design pattern of multi-agent collaboration.
Inspired by the neural scaling law, this study investigates whether a similar principle applies to increasing agents in multi-agent collaboration.
arXiv Detail & Related papers (2024-06-11T11:02:04Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Pure Exploration in Asynchronous Federated Bandits [57.02106627533004]
We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.
We propose the first asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence.
arXiv Detail & Related papers (2023-10-17T06:04:00Z) - Inverse Factorized Q-Learning for Cooperative Multi-agent Imitation
Learning [13.060023718506917]
imitation learning (IL) is a problem of learning to mimic expert behaviors from demonstrations in cooperative multi-agent systems.
We introduce a novel multi-agent IL algorithm designed to address these challenges.
Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions.
arXiv Detail & Related papers (2023-10-10T17:11:20Z) - Collaborative Learning in General Graphs with Limited Memorization:
Complexity, Learnability, and Reliability [30.432136485068572]
We consider a K-armed bandit problem in general graphs where agents are arbitrarily connected.
The goal is to let each of the agents eventually learn the best arm.
We propose a three-staged collaborative learning algorithm.
arXiv Detail & Related papers (2022-01-29T02:42:25Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Provably Efficient Cooperative Multi-Agent Reinforcement Learning with
Function Approximation [15.411902255359074]
We show that it is possible to achieve near-optimal no-regret learning even with a fixed constant communication budget.
Our work generalizes several ideas from the multi-agent contextual and multi-armed bandit literature to MDPs and reinforcement learning.
arXiv Detail & Related papers (2021-03-08T18:51:00Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.