Related papers: Solving Multi-Arm Bandit Using a Few Bits of Communication

Solving Multi-Arm Bandit Using a Few Bits of Communication

URL: http://arxiv.org/abs/2111.06067v1
Date: Thu, 11 Nov 2021 06:23:16 GMT
Title: Solving Multi-Arm Bandit Using a Few Bits of Communication
Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli
Abstract summary: Multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. We address the communication problem by optimizing the communication of rewards collected by distributed agents. We establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart.
Score: 42.13277217013971
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments.

Related papers

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference [18.101059380671582]
Multi-armed bandits (MABs) are frequently used for online sequential decision-making in applications ranging from recommending personalized content to assigning treatments to patients. We study the MAB problem under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given interference graph. We propose a novel algorithm that uses the local structure of the interference graph to minimize regret.
arXiv Detail & Related papers (2025-03-10T17:25:24Z)
Quantile Multi-Armed Bandits with 1-bit Feedback [40.22079522118723]
We study a variant of best-arm identification involving elements of risk sensitivity and communication constraints. We propose an algorithm that utilizes noisy binary search as a subroutine, allowing the learner to estimate quantile rewards through 1-bit feedback.
arXiv Detail & Related papers (2025-02-10T17:03:33Z)
Forced Exploration in Bandit Problems [12.13966146283641]
The multi-armed bandit(MAB) is a classical sequential decision problem. This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution.
arXiv Detail & Related papers (2023-12-12T14:00:29Z)
Generalized Schrödinger Bridge Matching [54.171931505066]
Generalized Schr"odinger Bridge (GSB) problem setup is prevalent in many scientific areas both within and without machine learning. We propose Generalized Schr"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances. We show that such a generalization can be cast as solving conditional optimal control, for which variational approximations can be used.
arXiv Detail & Related papers (2023-10-03T17:42:11Z)
Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging. We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z)
ICQ: A Quantization Scheme for Best-Arm Identification Over Bit-Constrained Channels [9.173160301214805]
We study the problem of best-arm identification in a distributed variant of the multi-armed bandit setting. We propose a novel quantization scheme called Inflating Confidence for Quantization (ICQ) that can be applied to existing confidence-bound based learning algorithms.
arXiv Detail & Related papers (2023-04-30T17:00:03Z)
Energy Regularized RNNs for Solving Non-Stationary Bandit Problems [97.72614340294547]
We present an energy term that prevents the neural network from becoming too confident in support of a certain action. We demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits.
arXiv Detail & Related papers (2023-03-12T03:32:43Z)
New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms. We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z)
Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff. We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z)
Multi-Armed Bandits with Censored Consumption of Resources [9.803834317538747]
We consider a resource-aware variant of the classical multi-armed bandit problem. In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit.
arXiv Detail & Related papers (2020-11-02T08:27:38Z)
Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.