Related papers: Minimizing Queue Length Regret for Arbitrarily Varying Channels

Minimizing Queue Length Regret for Arbitrarily Varying Channels

URL: http://arxiv.org/abs/2501.13551v1
Date: Thu, 23 Jan 2025 10:54:22 GMT
Title: Minimizing Queue Length Regret for Arbitrarily Varying Channels
Authors: G Krishnakumar, Abhishek Sinha,
Abstract summary: We consider an online channel scheduling problem for a single transmitter-receiver pair equipped with $N$ arbitrarily varying wireless channels.<n>The transmission rates of the channels might be non-stationary and could be controlled by an oblivious adversary.<n>We propose a weakly adaptive Multi-Armed Bandit (MAB) algorithm for minimizing the queue length regret in this setup.
Score: 9.091372562683311
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We consider an online channel scheduling problem for a single transmitter-receiver pair equipped with $N$ arbitrarily varying wireless channels. The transmission rates of the channels might be non-stationary and could be controlled by an oblivious adversary. At every slot, incoming data arrives at an infinite-capacity data queue located at the transmitter. A scheduler, which is oblivious to the current channel rates, selects one of the $N$ channels for transmission. At the end of the slot, the scheduler only gets to know the transmission rate of the selected channel. The objective is to minimize the queue length regret, defined as the difference between the queue length at some time $T$ achieved by an online policy and the queue length obtained by always transmitting over the single best channel in hindsight. We propose a weakly adaptive Multi-Armed Bandit (MAB) algorithm for minimizing the queue length regret in this setup. Unlike previous works, we do not make any stability assumptions about the queue or the arrival process. Hence, our result holds even when the queueing process is unstable. Our main observation is that the queue length regret can be upper bounded by the regret of a MAB policy that competes against the best channel in hindsight uniformly over all sub-intervals of $[T]$. As a technical contribution of independent interest, we then propose a weakly adaptive adversarial MAB policy which achieves $\tilde{O}(\sqrt{N}T^{\frac{3}{4}})$ regret with high probability, implying the same bound for queue length regret.

Related papers

Finite-Time Information-Theoretic Bounds in Queueing Control [54.11376591632282]
We derive new policies that achieve them-for the total queue length in scheduling problems over processing networks with both adversarial and arrivals.<n>These findings reveal a fundamental limitation on "drift-only" methods and point the way toward principled, non-asymptotic optimality in queueing control.
arXiv Detail & Related papers (2025-06-23T04:14:40Z)
Capacity-Constrained Online Learning with Delays: Scheduling Frameworks and Regret Trade-offs [60.7808741738461]
We study online learning with oblivious losses delays under a novel clairvoyance'' that limits how many past rounds can be tracked simultaneously for delayed feedback. Our algorithms achieve mini-optimal regret across all capacity levels, with performance gracefully under suboptimal capacity.
arXiv Detail & Related papers (2025-03-25T17:20:39Z)
Age Optimal Sampling for Unreliable Channels under Unknown Channel Statistics [25.04993246830622]
We study a system in which a sensor forwards status updates to a receiver through an errorprone channel, while the receiver sends the transmission results back to the sensor via a reliable channel.<n>To evaluate the timeliness of the status information at the receiver, we use the Age of Information metric.<n>We propose a Robbins-Monro algorithm to solve this problem and demonstrate that the optimal threshold can be approximated almost surely.
arXiv Detail & Related papers (2024-12-24T03:06:22Z)
Queueing Matching Bandits with Preference Feedback [10.988222071035198]
We consider multi-class asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side. The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function. We propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound to $O(minN,K/epsilon)$ for a large time horizon.
arXiv Detail & Related papers (2024-10-14T02:29:06Z)
Achieving Constant Regret in Linear Markov Decision Processes [57.34287648914407]
We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs)<n>We show that Cert-LSVI-UCB has a cumulative regret of $tildemathcalO(d3H5/Delta)$ with high probability, provided that the misspecification level $zeta$ is below $tildemathcalO(Delta / (sqrtdH2))$.
arXiv Detail & Related papers (2024-04-16T17:23:19Z)
Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z)
Queue Scheduling with Adversarial Bandit Learning [20.606599586119835]
We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization. We present two novel algorithms capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies.
arXiv Detail & Related papers (2023-03-03T07:17:09Z)
Learning While Scheduling in Multi-Server Systems with Unknown Statistics: MaxWeight with Discounted UCB [18.898514227870926]
This paper considers a multi-server system with multiple servers and multiple types of jobs, where different job types require different amounts of processing time at different servers. The goal is to schedule jobs on servers without knowing the statistics of the processing times. We propose a new algorithm, which combines the MaxWeight scheduling policy with discounted upper confidence bound (UCB) to simultaneously learn statistics and schedule jobs to servers.
arXiv Detail & Related papers (2022-09-02T15:37:02Z)
Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z)
Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP) We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z)
Adapting to Delays and Data in Adversarial Multi-Armed Bandits [7.310043452300736]
We analyze variants of the Exp3 algorithm that tune their step-size using only information available at the time of the decisions. We obtain regret guarantees that adapt to the observed (rather than the worst-case) sequences of delays and/or losses.
arXiv Detail & Related papers (2020-10-12T20:53:52Z)
Harnessing Wireless Channels for Scalable and Privacy-Preserving Federated Learning [56.94644428312295]
Wireless connectivity is instrumental in enabling federated learning (FL) Channel randomnessperturbs each worker inversions model update while multiple workers updates incur significant interference on bandwidth. In A-FADMM, all workers upload their model updates to the parameter server using a single channel via analog transmissions. This not only saves communication bandwidth, but also hides each worker's exact model update trajectory from any eavesdropper.
arXiv Detail & Related papers (2020-07-03T16:31:15Z)
Learning Algorithms for Minimizing Queue Length Regret [5.8010446129208155]
Packets randomly arrive to a transmitter's queue and wait to be successfully sent to the receiver. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over $T$ time slots. We show that there exists a set of queue-length based policies that can obtain order optimal $O(1)$ queue length regret.
arXiv Detail & Related papers (2020-05-11T15:50:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.