Minimizing Queue Length Regret for Arbitrarily Varying Channels
- URL: http://arxiv.org/abs/2501.13551v1
- Date: Thu, 23 Jan 2025 10:54:22 GMT
- Title: Minimizing Queue Length Regret for Arbitrarily Varying Channels
- Authors: G Krishnakumar, Abhishek Sinha,
- Abstract summary: We consider an online channel scheduling problem for a single transmitter-receiver pair equipped with $N$ arbitrarily varying wireless channels.
The transmission rates of the channels might be non-stationary and could be controlled by an oblivious adversary.
We propose a weakly adaptive Multi-Armed Bandit (MAB) algorithm for minimizing the queue length regret in this setup.
- Score: 9.091372562683311
- License:
- Abstract: We consider an online channel scheduling problem for a single transmitter-receiver pair equipped with $N$ arbitrarily varying wireless channels. The transmission rates of the channels might be non-stationary and could be controlled by an oblivious adversary. At every slot, incoming data arrives at an infinite-capacity data queue located at the transmitter. A scheduler, which is oblivious to the current channel rates, selects one of the $N$ channels for transmission. At the end of the slot, the scheduler only gets to know the transmission rate of the selected channel. The objective is to minimize the queue length regret, defined as the difference between the queue length at some time $T$ achieved by an online policy and the queue length obtained by always transmitting over the single best channel in hindsight. We propose a weakly adaptive Multi-Armed Bandit (MAB) algorithm for minimizing the queue length regret in this setup. Unlike previous works, we do not make any stability assumptions about the queue or the arrival process. Hence, our result holds even when the queueing process is unstable. Our main observation is that the queue length regret can be upper bounded by the regret of a MAB policy that competes against the best channel in hindsight uniformly over all sub-intervals of $[T]$. As a technical contribution of independent interest, we then propose a weakly adaptive adversarial MAB policy which achieves $\tilde{O}(\sqrt{N}T^{\frac{3}{4}})$ regret with high probability, implying the same bound for queue length regret.
Related papers
- Age Optimal Sampling for Unreliable Channels under Unknown Channel Statistics [25.04993246830622]
We study a system in which a sensor forwards status updates to a receiver through an errorprone channel, while the receiver sends the transmission results back to the sensor via a reliable channel.
To evaluate the timeliness of the status information at the receiver, we use the Age of Information metric.
We propose a Robbins-Monro algorithm to solve this problem and demonstrate that the optimal threshold can be approximated almost surely.
arXiv Detail & Related papers (2024-12-24T03:06:22Z) - Queueing Matching Bandits with Preference Feedback [10.988222071035198]
We consider multi-class asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side.
The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function.
We propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound to $O(minN,K/epsilon)$ for a large time horizon.
arXiv Detail & Related papers (2024-10-14T02:29:06Z) - Achieving Constant Regret in Linear Markov Decision Processes [57.34287648914407]
We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs)
We show that Cert-LSVI-UCB has a cumulative regret of $tildemathcalO(d3H5/Delta)$ with high probability, provided that the misspecification level $zeta$ is below $tildemathcalO(Delta / (sqrtdH2))$.
arXiv Detail & Related papers (2024-04-16T17:23:19Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Queue Scheduling with Adversarial Bandit Learning [20.606599586119835]
We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates.
Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization.
We present two novel algorithms capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies.
arXiv Detail & Related papers (2023-03-03T07:17:09Z) - Multi-Receiver Online Bayesian Persuasion [51.94795123103707]
We study an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type.
We focus on the case with no externalities and binary actions, as customary in offline models.
We introduce a general online descent scheme to handle online learning problems with a finite number of possible loss functions.
arXiv Detail & Related papers (2021-06-11T16:05:31Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z) - Adapting to Delays and Data in Adversarial Multi-Armed Bandits [7.310043452300736]
We analyze variants of the Exp3 algorithm that tune their step-size using only information available at the time of the decisions.
We obtain regret guarantees that adapt to the observed (rather than the worst-case) sequences of delays and/or losses.
arXiv Detail & Related papers (2020-10-12T20:53:52Z) - Harnessing Wireless Channels for Scalable and Privacy-Preserving
Federated Learning [56.94644428312295]
Wireless connectivity is instrumental in enabling federated learning (FL)
Channel randomnessperturbs each worker inversions model update while multiple workers updates incur significant interference on bandwidth.
In A-FADMM, all workers upload their model updates to the parameter server using a single channel via analog transmissions.
This not only saves communication bandwidth, but also hides each worker's exact model update trajectory from any eavesdropper.
arXiv Detail & Related papers (2020-07-03T16:31:15Z) - Learning Algorithms for Minimizing Queue Length Regret [5.8010446129208155]
Packets randomly arrive to a transmitter's queue and wait to be successfully sent to the receiver.
The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over $T$ time slots.
We show that there exists a set of queue-length based policies that can obtain order optimal $O(1)$ queue length regret.
arXiv Detail & Related papers (2020-05-11T15:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.