Solving Multi-Arm Bandit Using a Few Bits of Communication
- URL: http://arxiv.org/abs/2111.06067v1
- Date: Thu, 11 Nov 2021 06:23:16 GMT
- Title: Solving Multi-Arm Bandit Using a Few Bits of Communication
- Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli
- Abstract summary: Multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards.
We address the communication problem by optimizing the communication of rewards collected by distributed agents.
We establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart.
- Score: 42.13277217013971
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The multi-armed bandit (MAB) problem is an active learning framework that
aims to select the best among a set of actions by sequentially observing
rewards. Recently, it has become popular for a number of applications over
wireless networks, where communication constraints can form a bottleneck.
Existing works usually fail to address this issue and can become infeasible in
certain applications. In this paper we address the communication problem by
optimizing the communication of rewards collected by distributed agents. By
providing nearly matching upper and lower bounds, we tightly characterize the
number of bits needed per reward for the learner to accurately learn without
suffering additional regret. In particular, we establish a generic reward
quantization algorithm, QuBan, that can be applied on top of any (no-regret)
MAB algorithm to form a new communication-efficient counterpart, that requires
only a few (as low as 3) bits to be sent per iteration while preserving the
same regret bound. Our lower bound is established via constructing hard
instances from a subgaussian distribution. Our theory is further corroborated
by numerically experiments.
Related papers
- Quantile Multi-Armed Bandits with 1-bit Feedback [40.22079522118723]
We study a variant of best-arm identification involving elements of risk sensitivity and communication constraints.
We propose an algorithm that utilizes noisy binary search as a subroutine, allowing the learner to estimate quantile rewards through 1-bit feedback.
arXiv Detail & Related papers (2025-02-10T17:03:33Z) - Rising Rested Bandits: Lower Bounds and Efficient Algorithms [15.390680055166769]
This paper is in the field of sequential Multi-Armed Bandits (MABs)
We study a particular case of the rested bandits in which the arms' expected reward is monotonically non-decreasing and concave.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2024-11-06T22:00:46Z) - Forced Exploration in Bandit Problems [12.13966146283641]
The multi-armed bandit(MAB) is a classical sequential decision problem.
This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution.
arXiv Detail & Related papers (2023-12-12T14:00:29Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - ICQ: A Quantization Scheme for Best-Arm Identification Over
Bit-Constrained Channels [9.173160301214805]
We study the problem of best-arm identification in a distributed variant of the multi-armed bandit setting.
We propose a novel quantization scheme called Inflating Confidence for Quantization (ICQ) that can be applied to existing confidence-bound based learning algorithms.
arXiv Detail & Related papers (2023-04-30T17:00:03Z) - Energy Regularized RNNs for Solving Non-Stationary Bandit Problems [97.72614340294547]
We present an energy term that prevents the neural network from becoming too confident in support of a certain action.
We demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits.
arXiv Detail & Related papers (2023-03-12T03:32:43Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.