Quantifying the Cost of Learning in Queueing Systems
- URL: http://arxiv.org/abs/2308.07817v2
- Date: Fri, 27 Oct 2023 15:18:28 GMT
- Title: Quantifying the Cost of Learning in Queueing Systems
- Authors: Daniel Freund, Thodoris Lykouris, Wentao Weng
- Abstract summary: Cost of Learning in Queueing (CLQ) is a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty.
We propose a unified analysis framework for CLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.
- Score: 4.784875233446591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Queueing systems are widely applicable stochastic models with use cases in
communication networks, healthcare, service systems, etc. Although their
optimal control has been extensively studied, most existing approaches assume
perfect knowledge of the system parameters. Of course, this assumption rarely
holds in practice where there is parameter uncertainty, thus motivating a
recent line of work on bandit learning for queueing systems. This nascent
stream of research focuses on the asymptotic performance of the proposed
algorithms.
In this paper, we argue that an asymptotic metric, which focuses on
late-stage performance, is insufficient to capture the intrinsic statistical
complexity of learning in queueing systems which typically occurs in the early
stage. Instead, we propose the Cost of Learning in Queueing (CLQ), a new metric
that quantifies the maximum increase in time-averaged queue length caused by
parameter uncertainty. We characterize the CLQ of a single queue multi-server
system, and then extend these results to multi-queue multi-server systems and
networks of queues. In establishing our results, we propose a unified analysis
framework for CLQ that bridges Lyapunov and bandit analysis, provides
guarantees for a wide range of algorithms, and could be of independent
interest.
Related papers
- Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks [35.78834550608041]
Classical SNO algorithms require network conditions to be stationary with time.
Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback.
Our proposed UMO2 algorithm ensures network stability and also matches the utility performance of any "mildly varying" reference policy.
arXiv Detail & Related papers (2024-08-29T02:18:28Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle.
We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z) - Learning Optimal Admission Control in Partially Observable Queueing
Networks [4.254099382808599]
We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network.
We show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$.
arXiv Detail & Related papers (2023-08-04T15:40:23Z) - Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions.
For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions.
The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z) - Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model.
As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly.
We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z) - Scheduling Servers with Stochastic Bilinear Rewards [7.519872646378837]
A system optimization problem arises in multi-class, multi-server queueing system scheduling.
We propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward.
Our algorithm sub-linear regret and sublinear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability.
arXiv Detail & Related papers (2021-12-13T00:37:20Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time
Switching System Models [6.85316573653194]
We prove that Q-learning with a constant step-size can be naturally formulated as discrete-time switched linear systems.
It offers novel and intuitive insights on Q-learning mainly based on controloretic frameworks.
arXiv Detail & Related papers (2021-02-17T05:32:07Z) - Queue-Learning: A Reinforcement Learning Approach for Providing Quality
of Service [1.8477401359673706]
Servicerate control is a common mechanism for providing guarantees in service systems.
In this paper, we introduce a reinforcement learning-based (RL-based) service-rate controller.
Our controller provides explicit probabilistic guarantees on the end-to-end delay of the system.
arXiv Detail & Related papers (2021-01-12T17:28:57Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.