Learning Algorithms for Minimizing Queue Length Regret
- URL: http://arxiv.org/abs/2005.05206v2
- Date: Thu, 14 May 2020 13:26:02 GMT
- Title: Learning Algorithms for Minimizing Queue Length Regret
- Authors: Thomas Stahlbuhk, Brooke Shrader and Eytan Modiano
- Abstract summary: Packets randomly arrive to a transmitter's queue and wait to be successfully sent to the receiver.
The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over $T$ time slots.
We show that there exists a set of queue-length based policies that can obtain order optimal $O(1)$ queue length regret.
- Score: 5.8010446129208155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a system consisting of a single transmitter/receiver pair and $N$
channels over which they may communicate. Packets randomly arrive to the
transmitter's queue and wait to be successfully sent to the receiver. The
transmitter may attempt a frame transmission on one channel at a time, where
each frame includes a packet if one is in the queue. For each channel, an
attempted transmission is successful with an unknown probability. The
transmitter's objective is to quickly identify the best channel to minimize the
number of packets in the queue over $T$ time slots. To analyze system
performance, we introduce queue length regret, which is the expected difference
between the total queue length of a learning policy and a controller that knows
the rates, a priori. One approach to designing a transmission policy would be
to apply algorithms from the literature that solve the closely-related
stochastic multi-armed bandit problem. These policies would focus on maximizing
the number of successful frame transmissions over time. However, we show that
these methods have $\Omega(\log{T})$ queue length regret. On the other hand, we
show that there exists a set of queue-length based policies that can obtain
order optimal $O(1)$ queue length regret. We use our theoretical analysis to
devise heuristic methods that are shown to perform well in simulation.
Related papers
- Queueing Matching Bandits with Preference Feedback [10.988222071035198]
We consider multi-class asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side.
The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function.
We propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound to $O(minN,K/epsilon)$ for a large time horizon.
arXiv Detail & Related papers (2024-10-14T02:29:06Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Learning-based Scheduling for Information Accuracy and Freshness in
Wireless Networks [0.0]
We consider a system of multiple sources, a single communication channel, and a single monitoring station.
The probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler.
arXiv Detail & Related papers (2023-10-24T10:31:34Z) - Revisiting Decentralized ProxSkip: Achieving Linear Speedup [50.28561300894826]
Existing analyses of ProxSkipilon are limited convex setting and do not achieve the strongly linear speed of ProxSkipilon.
This paper shows that ProxSkipilon can achieve linear speed up and can reduce communication overhead proportional to the probability of communication.
arXiv Detail & Related papers (2023-10-12T02:13:48Z) - Queue Scheduling with Adversarial Bandit Learning [20.606599586119835]
We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates.
Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization.
We present two novel algorithms capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies.
arXiv Detail & Related papers (2023-03-03T07:17:09Z) - Sequential Information Design: Learning to Persuade in the Dark [49.437419242582884]
We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver.
At each round, the sender observes the realizations of random events in the sequential decision making (SDM) problem.
This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations.
arXiv Detail & Related papers (2022-09-08T17:08:12Z) - Learning While Scheduling in Multi-Server Systems with Unknown
Statistics: MaxWeight with Discounted UCB [18.898514227870926]
This paper considers a multi-server system with multiple servers and multiple types of jobs, where different job types require different amounts of processing time at different servers.
The goal is to schedule jobs on servers without knowing the statistics of the processing times.
We propose a new algorithm, which combines the MaxWeight scheduling policy with discounted upper confidence bound (UCB) to simultaneously learn statistics and schedule jobs to servers.
arXiv Detail & Related papers (2022-09-02T15:37:02Z) - Multi-Receiver Online Bayesian Persuasion [51.94795123103707]
We study an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type.
We focus on the case with no externalities and binary actions, as customary in offline models.
We introduce a general online descent scheme to handle online learning problems with a finite number of possible loss functions.
arXiv Detail & Related papers (2021-06-11T16:05:31Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for
Wireless Networks with Stochastic Arrivals [24.897224064639406]
We consider a single-hop wireless network with sources transmitting time-sensitive information over unreliable channels.
At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel.
The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots.
arXiv Detail & Related papers (2020-12-16T00:58:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.