Related papers: Linear Stochastic Bandits over a Bit-Constrained Channel

Linear Stochastic Bandits over a Bit-Constrained Channel

URL: http://arxiv.org/abs/2203.01198v1
Date: Wed, 2 Mar 2022 15:54:03 GMT
Title: Linear Stochastic Bandits over a Bit-Constrained Channel
Authors: Aritra Mitra, Hamed Hassani and George J. Pappas
Abstract summary: We introduce a new linear bandit formulation over a bit-constrained channel. The goal of the server is to take actions based on estimates of an unknown model parameter to minimize cumulative regret. We prove that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$ bits suffices to achieve order-optimal regret.
Score: 37.01818450308119
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One of the primary challenges in large-scale distributed learning stems from stringent communication constraints. While several recent works address this challenge for static optimization problems, sequential decision-making under uncertainty has remained much less explored in this regard. Motivated by this gap, we introduce a new linear stochastic bandit formulation over a bit-constrained channel. Specifically, in our setup, an agent interacting with an environment transmits encoded estimates of an unknown model parameter to a server over a communication channel of finite capacity. The goal of the server is to take actions based on these estimates to minimize cumulative regret. To this end, we develop a novel and general algorithmic framework that hinges on two main components: (i) an adaptive encoding mechanism that exploits statistical concentration bounds, and (ii) a decision-making principle based on confidence sets that account for encoding errors. As our main result, we prove that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$ bits suffices to achieve order-optimal regret. To demonstrate the generality of our approach, we then show that the same result continues to hold for non-linear observation models satisfying standard regularity conditions. Finally, we establish that for the simpler unstructured multi-armed bandit problem, $1$ bit channel-capacity is sufficient for achieving optimal regret bounds. Overall, our work takes a significant first step towards paving the way for statistical decision-making over finite-capacity channels.

Related papers

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [60.414548453838506]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs. We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
Beyond Non-Degeneracy: Revisiting Certainty Equivalent Heuristic for Online Linear Programming [18.371947752008744]
We show that Certainty Equivalent achieves uniformly near-optimal regret under mild assumptions on the underlying distribution. Our result implies that, contrary to prior belief, CE effectively beats the curse of degeneracy for a wide range of problem instances. These techniques may find potential applications in broader online decision-making contexts.
arXiv Detail & Related papers (2025-01-03T09:21:27Z)
Neighbor-Aware Calibration of Segmentation Networks with Penalty-Based Constraints [19.897181782914437]
We propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty. Our approach can be used to train a wide span of deep segmentation networks.
arXiv Detail & Related papers (2024-01-25T19:46:57Z)
Towards Model-Free LQR Control over Rate-Limited Channels [2.908482270923597]
We study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose a new algorithm titled Adaptively Quantized Gradient Descent (textttAQGD), and prove that above a certain finite threshold bit-rate, textttAQGD guarantees exponentially fast convergence to the globally optimal policy.
arXiv Detail & Related papers (2024-01-02T15:59:00Z)
A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution. We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z)
Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes. We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z)
Outlier-robust neural network training: variation regularization meets trimmed loss to prevent functional breakdown [2.5628953713168685]
We tackle the challenge of outlier-robust predictive modeling using highly expressive neural networks.<n>Our approach integrates two key components: (1) a transformed trimmed loss (TTL), and (2) higher-order variation regularization (HOVR), which imposes smoothness constraints on the prediction function.
arXiv Detail & Related papers (2023-08-04T12:57:13Z)
Trust your neighbours: Penalty-based constraints for model calibration [19.437451462590108]
We present a constrained optimization perspective of SVLS and demonstrate that it enforces an implicit constraint on soft class proportions of surrounding pixels. We propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty.
arXiv Detail & Related papers (2023-03-11T01:10:26Z)
Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations. We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)
On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm. We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z)
Breaking the Communication-Privacy-Accuracy Trilemma [19.399122892615573]
Two major challenges in distributed learning are preserving the privacy of local samples and communicating them efficiently to a central server. We develop novel encoding and decoding mechanisms that simultaneously achieve optimal privacy and communication efficiency.
arXiv Detail & Related papers (2020-07-22T22:43:01Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs [14.309243378538012]
We consider the problem of robust and adaptive model predictive control (MPC) of a linear system. We provide the first end-to-end suboptimal tractity analysis for this setting.
arXiv Detail & Related papers (2020-02-25T12:24:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.