Kernel Methods for Cooperative Multi-Agent Contextual Bandits
- URL: http://arxiv.org/abs/2008.06220v1
- Date: Fri, 14 Aug 2020 07:37:44 GMT
- Title: Kernel Methods for Cooperative Multi-Agent Contextual Bandits
- Authors: Abhimanyu Dubey and Alex Pentland
- Abstract summary: Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays.
We consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related kernel reproducing Hilbert space (RKHS)
We propose textscCoop- KernelUCB, an algorithm that provides near-optimal bounds on the per-agent regret.
- Score: 15.609414012418043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative multi-agent decision making involves a group of agents
cooperatively solving learning problems while communicating over a network with
delays. In this paper, we consider the kernelised contextual bandit problem,
where the reward obtained by an agent is an arbitrary linear function of the
contexts' images in the related reproducing kernel Hilbert space (RKHS), and a
group of agents must cooperate to collectively solve their unique decision
problems. For this problem, we propose \textsc{Coop-KernelUCB}, an algorithm
that provides near-optimal bounds on the per-agent regret, and is both
computationally and communicatively efficient. For special cases of the
cooperative problem, we also provide variants of \textsc{Coop-KernelUCB} that
provides optimal per-agent regret. In addition, our algorithm generalizes
several existing results in the multi-agent bandit setting. Finally, on a
series of both synthetic and real-world multi-agent network benchmarks, we
demonstrate that our algorithm significantly outperforms existing benchmarks.
Related papers
- Distributed Optimization via Kernelized Multi-armed Bandits [6.04275169308491]
We model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting.
We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels.
We also propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network.
arXiv Detail & Related papers (2023-12-07T21:57:48Z) - Clustered Multi-Agent Linear Bandits [5.893124686141782]
We address a particular instance of the multi-agent linear bandit problem, called clustered multi-agent linear bandits.
We propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem.
arXiv Detail & Related papers (2023-09-15T19:01:42Z) - Distributed Consensus Algorithm for Decision-Making in Multi-agent
Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment.
A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points.
The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment.
Standard sampling-based algorithms can be used to search for solutions in the robots' joint space.
We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods.
We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit.
In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine.
We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private.
Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Cooperative Multi-Agent Bandits with Heavy Tails [15.609414012418043]
We study the heavy-tailed bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem.
Existing algorithms for the bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol.
We propose textscMP-UCB, a decentralized multi-agent algorithm for the cooperative bandit that incorporates robust estimation with a message-passing protocol.
arXiv Detail & Related papers (2020-08-14T08:34:32Z) - A Multi-Agent Primal-Dual Strategy for Composite Optimization over
Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.