Distributed Optimization via Kernelized Multi-armed Bandits
- URL: http://arxiv.org/abs/2312.04719v1
- Date: Thu, 7 Dec 2023 21:57:48 GMT
- Title: Distributed Optimization via Kernelized Multi-armed Bandits
- Authors: Ayush Rai and Shaoshuai Mou
- Abstract summary: We model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting.
We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels.
We also propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network.
- Score: 6.04275169308491
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-armed bandit algorithms provide solutions for sequential
decision-making where learning takes place by interacting with the environment.
In this work, we model a distributed optimization problem as a multi-agent
kernelized multi-armed bandit problem with a heterogeneous reward setting. In
this setup, the agents collaboratively aim to maximize a global objective
function which is an average of local objective functions. The agents can
access only bandit feedback (noisy reward) obtained from the associated unknown
local function with a small norm in reproducing kernel Hilbert space (RKHS). We
present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB),
which achieves a sub-linear regret bound for popular classes for kernels while
preserving privacy. It does not necessitate the agents to share their actions,
rewards, or estimates of their local function. In the proposed approach, the
agents sample their individual local functions in a way that benefits the whole
network by utilizing a running consensus to estimate the upper confidence bound
on the global function. Furthermore, we propose an extension, Multi-agent
Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the
regret bound on the number of agents in the network. It provides improved
performance by utilizing a delay in the estimation update step at the cost of
more communication.
Related papers
- Scalable spectral representations for network multiagent control [53.631272539560435]
A popular model for multi-agent control, Network Markov Decision Processes (MDPs) pose a significant challenge to efficient learning.
We first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local $Q$-function of each agent.
We design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm.
arXiv Detail & Related papers (2024-10-22T17:45:45Z) - Order-Optimal Regret in Distributed Kernel Bandits using Uniform
Sampling with Shared Randomness [9.731329071569018]
We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function.
We develop the first algorithm that achieves the optimal regret order with a communication cost that is sublinear in both $N$ and $T$.
arXiv Detail & Related papers (2024-02-20T17:49:10Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Convergence Rates of Average-Reward Multi-agent Reinforcement Learning
via Randomized Linear Programming [41.30044824711509]
We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability.
We develop multi-agent extensions, whereby agents solve their local saddle point problems and then perform local weighted averaging.
We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces.
arXiv Detail & Related papers (2021-10-22T03:48:41Z) - Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
Reinforcement Learning [22.310861786709538]
We propose a scalable algorithm for cooperative multi-agent reinforcement learning.
We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
arXiv Detail & Related papers (2021-09-23T23:38:15Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Kernel Methods for Cooperative Multi-Agent Contextual Bandits [15.609414012418043]
Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays.
We consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related kernel reproducing Hilbert space (RKHS)
We propose textscCoop- KernelUCB, an algorithm that provides near-optimal bounds on the per-agent regret.
arXiv Detail & Related papers (2020-08-14T07:37:44Z) - Multi-Agent Reinforcement Learning in Stochastic Networked Systems [30.78949372661673]
We study multi-agent reinforcement learning (MARL) in a network of agents.
The objective is to find localized policies that maximize the (discounted) global reward.
arXiv Detail & Related papers (2020-06-11T16:08:16Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.