Order-Optimal Regret in Distributed Kernel Bandits using Uniform
Sampling with Shared Randomness
- URL: http://arxiv.org/abs/2402.13182v1
- Date: Tue, 20 Feb 2024 17:49:10 GMT
- Title: Order-Optimal Regret in Distributed Kernel Bandits using Uniform
Sampling with Shared Randomness
- Authors: Nikola Pavlovic, Sudeep Salgia, Qing Zhao
- Abstract summary: We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function.
We develop the first algorithm that achieves the optimal regret order with a communication cost that is sublinear in both $N$ and $T$.
- Score: 9.731329071569018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider distributed kernel bandits where $N$ agents aim to
collaboratively maximize an unknown reward function that lies in a reproducing
kernel Hilbert space. Each agent sequentially queries the function to obtain
noisy observations at the query points. Agents can share information through a
central server, with the objective of minimizing regret that is accumulating
over time $T$ and aggregating over agents. We develop the first algorithm that
achieves the optimal regret order (as defined by centralized learning) with a
communication cost that is sublinear in both $N$ and $T$. The key features of
the proposed algorithm are the uniform exploration at the local agents and
shared randomness with the central server. Working together with the sparse
approximation of the GP model, these two key components make it possible to
preserve the learning rate of the centralized setting at a diminishing rate of
communication.
Related papers
- Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Decentralized Multi-Task Online Convex Optimization Under Random Link
Failures [5.513958040574729]
We develop a robust decentralized saddle-point algorithm against random link failures with heterogeneous probabilities.
We extend our algorithm and analysis to the two-point bandit feedback scenario.
arXiv Detail & Related papers (2024-01-04T00:57:33Z) - Distributed Optimization via Kernelized Multi-armed Bandits [6.04275169308491]
We model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting.
We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels.
We also propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network.
arXiv Detail & Related papers (2023-12-07T21:57:48Z) - Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with
Heterogeneous Rewards [14.822625665220068]
We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment.
The goal is to minimize the overall regret of the entire system through collaborations.
arXiv Detail & Related papers (2023-06-08T22:37:24Z) - Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with
General Utilities [12.104551746465932]
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints.
Our algorithm converges to a first-order stationary point (FOSP) at the rate of $mathcalOleft(T-2/3right)$.
In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $widetildemathcalOleft(epsilon-3.5right)$ samples to achieve an $epsilon$-FOSP.
arXiv Detail & Related papers (2023-05-27T20:08:35Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.