Solving Multi-Arm Bandit Using a Few Bits of Communication
- URL: http://arxiv.org/abs/2111.06067v1
- Date: Thu, 11 Nov 2021 06:23:16 GMT
- Title: Solving Multi-Arm Bandit Using a Few Bits of Communication
- Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli
- Abstract summary: Multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards.
We address the communication problem by optimizing the communication of rewards collected by distributed agents.
We establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart.
- Score: 42.13277217013971
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The multi-armed bandit (MAB) problem is an active learning framework that
aims to select the best among a set of actions by sequentially observing
rewards. Recently, it has become popular for a number of applications over
wireless networks, where communication constraints can form a bottleneck.
Existing works usually fail to address this issue and can become infeasible in
certain applications. In this paper we address the communication problem by
optimizing the communication of rewards collected by distributed agents. By
providing nearly matching upper and lower bounds, we tightly characterize the
number of bits needed per reward for the learner to accurately learn without
suffering additional regret. In particular, we establish a generic reward
quantization algorithm, QuBan, that can be applied on top of any (no-regret)
MAB algorithm to form a new communication-efficient counterpart, that requires
only a few (as low as 3) bits to be sent per iteration while preserving the
same regret bound. Our lower bound is established via constructing hard
instances from a subgaussian distribution. Our theory is further corroborated
by numerically experiments.
Related papers
- Forced Exploration in Bandit Problems [12.13966146283641]
The multi-armed bandit(MAB) is a classical sequential decision problem.
This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution.
arXiv Detail & Related papers (2023-12-12T14:00:29Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - ICQ: A Quantization Scheme for Best-Arm Identification Over
Bit-Constrained Channels [9.173160301214805]
We study the problem of best-arm identification in a distributed variant of the multi-armed bandit setting.
We propose a novel quantization scheme called Inflating Confidence for Quantization (ICQ) that can be applied to existing confidence-bound based learning algorithms.
arXiv Detail & Related papers (2023-04-30T17:00:03Z) - Energy Regularized RNNs for Solving Non-Stationary Bandit Problems [97.72614340294547]
We present an energy term that prevents the neural network from becoming too confident in support of a certain action.
We demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits.
arXiv Detail & Related papers (2023-03-12T03:32:43Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Multi-Armed Bandits with Censored Consumption of Resources [9.803834317538747]
We consider a resource-aware variant of the classical multi-armed bandit problem.
In each round, the learner selects an arm and determines a resource limit.
It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit.
arXiv Detail & Related papers (2020-11-02T08:27:38Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.