Linear Stochastic Bandits over a Bit-Constrained Channel
- URL: http://arxiv.org/abs/2203.01198v1
- Date: Wed, 2 Mar 2022 15:54:03 GMT
- Title: Linear Stochastic Bandits over a Bit-Constrained Channel
- Authors: Aritra Mitra, Hamed Hassani and George J. Pappas
- Abstract summary: We introduce a new linear bandit formulation over a bit-constrained channel.
The goal of the server is to take actions based on estimates of an unknown model parameter to minimize cumulative regret.
We prove that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$ bits suffices to achieve order-optimal regret.
- Score: 37.01818450308119
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the primary challenges in large-scale distributed learning stems from
stringent communication constraints. While several recent works address this
challenge for static optimization problems, sequential decision-making under
uncertainty has remained much less explored in this regard. Motivated by this
gap, we introduce a new linear stochastic bandit formulation over a
bit-constrained channel. Specifically, in our setup, an agent interacting with
an environment transmits encoded estimates of an unknown model parameter to a
server over a communication channel of finite capacity. The goal of the server
is to take actions based on these estimates to minimize cumulative regret. To
this end, we develop a novel and general algorithmic framework that hinges on
two main components: (i) an adaptive encoding mechanism that exploits
statistical concentration bounds, and (ii) a decision-making principle based on
confidence sets that account for encoding errors. As our main result, we prove
that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$
bits suffices to achieve order-optimal regret. To demonstrate the generality of
our approach, we then show that the same result continues to hold for
non-linear observation models satisfying standard regularity conditions.
Finally, we establish that for the simpler unstructured multi-armed bandit
problem, $1$ bit channel-capacity is sufficient for achieving optimal regret
bounds. Overall, our work takes a significant first step towards paving the way
for statistical decision-making over finite-capacity channels.
Related papers
- Neighbor-Aware Calibration of Segmentation Networks with Penalty-Based
Constraints [19.897181782914437]
We propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty.
Our approach can be used to train a wide span of deep segmentation networks.
arXiv Detail & Related papers (2024-01-25T19:46:57Z) - Towards Model-Free LQR Control over Rate-Limited Channels [2.908482270923597]
We study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate.
We propose a new algorithm titled Adaptively Quantized Gradient Descent (textttAQGD), and prove that above a certain finite threshold bit-rate, textttAQGD guarantees exponentially fast convergence to the globally optimal policy.
arXiv Detail & Related papers (2024-01-02T15:59:00Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Trust your neighbours: Penalty-based constraints for model calibration [19.437451462590108]
We present a constrained optimization perspective of SVLS and demonstrate that it enforces an implicit constraint on soft class proportions of surrounding pixels.
We propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty.
arXiv Detail & Related papers (2023-03-11T01:10:26Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Breaking the Communication-Privacy-Accuracy Trilemma [19.399122892615573]
Two major challenges in distributed learning are preserving the privacy of local samples and communicating them efficiently to a central server.
We develop novel encoding and decoding mechanisms that simultaneously achieve optimal privacy and communication efficiency.
arXiv Detail & Related papers (2020-07-22T22:43:01Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z) - Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs [14.309243378538012]
We consider the problem of robust and adaptive model predictive control (MPC) of a linear system.
We provide the first end-to-end suboptimal tractity analysis for this setting.
arXiv Detail & Related papers (2020-02-25T12:24:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.