Rate-Constrained Remote Contextual Bandits
- URL: http://arxiv.org/abs/2204.12620v1
- Date: Tue, 26 Apr 2022 22:34:54 GMT
- Title: Rate-Constrained Remote Contextual Bandits
- Authors: Francesco Pase, Deniz G\"und\"uz, Michele Zorzi
- Abstract summary: We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem.
We study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity.
We then analyze the optimal compression scheme achievable in the limit with infinite agents, when using the forward and reverse KL divergence as distortion metric.
- Score: 18.40166098572039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a rate-constrained contextual multi-armed bandit (RC-CMAB)
problem, in which a group of agents are solving the same contextual multi-armed
bandit (CMAB) problem. However, the contexts are observed by a remotely
connected entity, i.e., the decision-maker, that updates the policy to maximize
the returned rewards, and communicates the arms to be sampled by the agents to
a controller over a rate-limited communications channel. This framework can be
applied to personalized ad placement, whenever the content owner observes the
website visitors, and hence has the context, but needs to transmit the ads to
be shown to a controller that is in charge of placing the marketing content.
Consequently, the rate-constrained CMAB (RC-CMAB) problem requires the study of
lossy compression schemes for the policy to be employed whenever the constraint
on the channel rate does not allow the uncompressed transmission of the
decision-maker's intentions. We characterize the fundamental information
theoretic limits of this problem by letting the number of agents go to
infinity, and study the regret that can be achieved, identifying the two
distinct rate regions leading to linear and sub-linear regrets respectively. We
then analyze the optimal compression scheme achievable in the limit with
infinite agents, when using the forward and reverse KL divergence as distortion
metric. Based on this, we also propose a practical coding scheme, and provide
numerical results.
Related papers
- Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints [0.0]
We propose a distributed upper confidence bound (UCB) algorithm, related-UCB.
Our algorithm constructs a pruned action set during each round to ensure the constraints are met.
We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.
arXiv Detail & Related papers (2024-01-21T18:43:55Z) - Towards Model-Free LQR Control over Rate-Limited Channels [2.908482270923597]
We study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate.
We propose a new algorithm titled Adaptively Quantized Gradient Descent (textttAQGD), and prove that above a certain finite threshold bit-rate, textttAQGD guarantees exponentially fast convergence to the globally optimal policy.
arXiv Detail & Related papers (2024-01-02T15:59:00Z) - Collaborative Optimization of the Age of Information under Partial
Observability [34.43476648472727]
Age of Information (AoI) is the freshness of sensor and control data at the receiver side.
We devise a decentralized AoI-minimizing transmission policy for a number of sensor agents sharing capacity-limited, non-FIFO duplex channels.
We also leverage mean-field control approximations and reinforcement learning to derive scalable and optimal solutions.
arXiv Detail & Related papers (2023-12-20T12:34:54Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Learning Resilient Radio Resource Management Policies with Graph Neural
Networks [124.89036526192268]
We formulate a resilient radio resource management problem with per-user minimum-capacity constraints.
We show that we can parameterize the user selection and power control policies using a finite set of parameters.
Thanks to such adaptation, our proposed method achieves a superior tradeoff between the average rate and the 5th percentile rate.
arXiv Detail & Related papers (2022-03-07T19:40:39Z) - Linear Stochastic Bandits over a Bit-Constrained Channel [37.01818450308119]
We introduce a new linear bandit formulation over a bit-constrained channel.
The goal of the server is to take actions based on estimates of an unknown model parameter to minimize cumulative regret.
We prove that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$ bits suffices to achieve order-optimal regret.
arXiv Detail & Related papers (2022-03-02T15:54:03Z) - Remote Contextual Bandits [18.40166098572039]
We consider a remote contextual multi-armed bandit (CMAB) problem.
The decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel.
We study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret achieved when Thompson sampling strategy is adopted.
arXiv Detail & Related papers (2022-02-10T17:31:20Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Communication-Efficient Split Learning Based on Analog Communication and
Over the Air Aggregation [48.150466900765316]
Split-learning (SL) has recently gained popularity due to its inherent privacy-preserving capabilities and ability to enable collaborative inference for devices with limited computational power.
Standard SL algorithms assume an ideal underlying digital communication system and ignore the problem of scarce communication bandwidth.
We propose a novel SL framework to solve the remote inference problem that introduces an additional layer at the agent side and constrains the choices of the weights and the biases to ensure over the air aggregation.
arXiv Detail & Related papers (2021-06-02T07:49:41Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.