Pure Exploration in Asynchronous Federated Bandits
- URL: http://arxiv.org/abs/2310.11015v1
- Date: Tue, 17 Oct 2023 06:04:00 GMT
- Title: Pure Exploration in Asynchronous Federated Bandits
- Authors: Zichen Wang, Chuanhao Li, Chenyu Song, Lianghui Wang, Quanquan Gu,
Huazheng Wang
- Abstract summary: We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server.
We propose the first asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence.
- Score: 60.420423973886834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the federated pure exploration problem of multi-armed bandits and
linear bandits, where $M$ agents cooperatively identify the best arm via
communicating with the central server. To enhance the robustness against
latency and unavailability of agents that are common in practice, we propose
the first federated asynchronous multi-armed bandit and linear bandit
algorithms for pure exploration with fixed confidence. Our theoretical analysis
shows the proposed algorithms achieve near-optimal sample complexities and
efficient communication costs in a fully asynchronous environment. Moreover,
experimental results based on synthetic and real-world data empirically
elucidate the effectiveness and communication cost-efficiency of the proposed
algorithms.
Related papers
- Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - A Federated Online Restless Bandit Framework for Cooperative Resource Allocation [23.698976872351576]
We study the cooperative resource allocation problem with unknown system dynamics of MRPs.
We put forth a Federated Thompson-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem.
Numerical results show that the proposed algorithm achieves a fast convergence rate of $mathcalO(sqrtTlog(T))$ and better performance compared with baselines.
arXiv Detail & Related papers (2024-06-12T08:34:53Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Federated Contextual Cascading Bandits with Asynchronous Communication
and Heterogeneous Users [95.77678166036561]
We propose a UCB-type algorithm with delicate communication protocols.
We give sub-linear regret bounds on par with those achieved in the synchronous framework.
Empirical evaluation on synthetic and real-world datasets validates our algorithm's superior performance in terms of regrets and communication costs.
arXiv Detail & Related papers (2024-02-26T05:31:14Z) - Incentivized Communication for Federated Bandits [67.4682056391551]
We introduce an incentivized communication problem for federated bandits, where the server shall motivate clients to share data by providing incentives.
We propose the first incentivized communication protocol, namely, Inc-FedUCB, that achieves near-optimal regret with provable communication and incentive cost guarantees.
arXiv Detail & Related papers (2023-09-21T00:59:20Z) - Communication-Efficient Collaborative Best Arm Identification [6.861971769602314]
We investigate top-$m$ arm identification, a basic problem in bandit theory, in a multi-agent learning model in which agents collaborate to learn an objective function.
We are interested in designing collaborative learning algorithms that achieve maximum speedup.
arXiv Detail & Related papers (2022-08-18T19:02:29Z) - Finite-Time Consensus Learning for Decentralized Optimization with
Nonlinear Gossiping [77.53019031244908]
We present a novel decentralized learning framework based on nonlinear gossiping (NGO), that enjoys an appealing finite-time consensus property to achieve better synchronization.
Our analysis on how communication delay and randomized chats affect learning further enables the derivation of practical variants.
arXiv Detail & Related papers (2021-11-04T15:36:25Z) - Cooperative Multi-Agent Bandits with Heavy Tails [15.609414012418043]
We study the heavy-tailed bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem.
Existing algorithms for the bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol.
We propose textscMP-UCB, a decentralized multi-agent algorithm for the cooperative bandit that incorporates robust estimation with a message-passing protocol.
arXiv Detail & Related papers (2020-08-14T08:34:32Z) - Task-Based Information Compression for Multi-Agent Communication
Problems with Channel Rate Constraints [28.727611928919725]
We introduce the state-aggregation for information compression algorithm (SAIC) to solve the formulated TBIC problem.
It is shown that SAIC is able to achieve near-optimal performance in terms of the achieved sum of discounted rewards.
arXiv Detail & Related papers (2020-05-28T18:29:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.