Related papers: Learning Algorithms for Minimizing Queue Length Regret

Learning Algorithms for Minimizing Queue Length Regret

URL: http://arxiv.org/abs/2005.05206v2
Date: Thu, 14 May 2020 13:26:02 GMT
Title: Learning Algorithms for Minimizing Queue Length Regret
Authors: Thomas Stahlbuhk, Brooke Shrader and Eytan Modiano
Abstract summary: Packets randomly arrive to a transmitter's queue and wait to be successfully sent to the receiver. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over $T$ time slots. We show that there exists a set of queue-length based policies that can obtain order optimal $O(1)$ queue length regret.
Score: 5.8010446129208155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a system consisting of a single transmitter/receiver pair and $N$ channels over which they may communicate. Packets randomly arrive to the transmitter's queue and wait to be successfully sent to the receiver. The transmitter may attempt a frame transmission on one channel at a time, where each frame includes a packet if one is in the queue. For each channel, an attempted transmission is successful with an unknown probability. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over $T$ time slots. To analyze system performance, we introduce queue length regret, which is the expected difference between the total queue length of a learning policy and a controller that knows the rates, a priori. One approach to designing a transmission policy would be to apply algorithms from the literature that solve the closely-related stochastic multi-armed bandit problem. These policies would focus on maximizing the number of successful frame transmissions over time. However, we show that these methods have $\Omega(\log{T})$ queue length regret. On the other hand, we show that there exists a set of queue-length based policies that can obtain order optimal $O(1)$ queue length regret. We use our theoretical analysis to devise heuristic methods that are shown to perform well in simulation.

Related papers

Queue Length Regret Bounds for Contextual Queueing Bandits [0.8984888893275712]
We introduce contextual queueing bandits, a new context-aware framework for scheduling while simultaneously learning unknown service rates.<n>We show that our algorithm, CQB-$varepsilon$, achieves a regret upper bound of $widetildemathcalO(T-1/4)$.<n>We also consider the setting of adversarially chosen contexts, for which our second algorithm, CQB-Opt, achieves a regret upper bound of $mathcalO(log2 T)$.
arXiv Detail & Related papers (2026-01-27T07:40:23Z)
Minimizing Queue Length Regret for Arbitrarily Varying Channels [9.091372562683311]
We consider an online channel scheduling problem for a single transmitter-receiver pair equipped with $N$ arbitrarily varying wireless channels. The transmission rates of the channels might be non-stationary and could be controlled by an oblivious adversary. We propose a weakly adaptive Multi-Armed Bandit (MAB) algorithm for minimizing the queue length regret in this setup.
arXiv Detail & Related papers (2025-01-23T10:54:22Z)
Age Optimal Sampling for Unreliable Channels under Unknown Channel Statistics [25.04993246830622]
We study a system in which a sensor forwards status updates to a receiver through an errorprone channel, while the receiver sends the transmission results back to the sensor via a reliable channel. To evaluate the timeliness of the status information at the receiver, we use the Age of Information metric. We propose a Robbins-Monro algorithm to solve this problem and demonstrate that the optimal threshold can be approximated almost surely.
arXiv Detail & Related papers (2024-12-24T03:06:22Z)
Queueing Matching Bandits with Preference Feedback [10.988222071035198]
We consider multi-class asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side. The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function. We propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound to $O(minN,K/epsilon)$ for a large time horizon.
arXiv Detail & Related papers (2024-10-14T02:29:06Z)
Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit. In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z)
DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server. To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z)
Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks [0.0]
We consider a system of multiple sources, a single communication channel, and a single monitoring station. The probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler.
arXiv Detail & Related papers (2023-10-24T10:31:34Z)
Revisiting Decentralized ProxSkip: Achieving Linear Speedup [50.28561300894826]
Existing analyses of ProxSkipilon are limited convex setting and do not achieve the strongly linear speed of ProxSkipilon. This paper shows that ProxSkipilon can achieve linear speed up and can reduce communication overhead proportional to the probability of communication.
arXiv Detail & Related papers (2023-10-12T02:13:48Z)
Queue Scheduling with Adversarial Bandit Learning [20.606599586119835]
We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization. We present two novel algorithms capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies.
arXiv Detail & Related papers (2023-03-03T07:17:09Z)
Sequential Information Design: Learning to Persuade in the Dark [49.437419242582884]
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
arXiv Detail & Related papers (2022-09-08T17:08:12Z)
Learning While Scheduling in Multi-Server Systems with Unknown Statistics: MaxWeight with Discounted UCB [18.898514227870926]
This paper considers a multi-server system with multiple servers and multiple types of jobs, where different job types require different amounts of processing time at different servers. The goal is to schedule jobs on servers without knowing the statistics of the processing times. We propose a new algorithm, which combines the MaxWeight scheduling policy with discounted upper confidence bound (UCB) to simultaneously learn statistics and schedule jobs to servers.
arXiv Detail & Related papers (2022-09-02T15:37:02Z)
A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z)
Multi-Receiver Online Bayesian Persuasion [51.94795123103707]
We study an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We focus on the case with no externalities and binary actions, as customary in offline models. We introduce a general online descent scheme to handle online learning problems with a finite number of possible loss functions.
arXiv Detail & Related papers (2021-06-11T16:05:31Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals [24.897224064639406]
We consider a single-hop wireless network with sources transmitting time-sensitive information over unreliable channels. At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots.
arXiv Detail & Related papers (2020-12-16T00:58:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.