Learning-NUM: Network Utility Maximization with Unknown Utility
Functions and Queueing Delay
- URL: http://arxiv.org/abs/2012.09222v1
- Date: Wed, 16 Dec 2020 19:36:25 GMT
- Title: Learning-NUM: Network Utility Maximization with Unknown Utility
Functions and Queueing Delay
- Authors: Xinzhe Fu, Eytan Modiano
- Abstract summary: We propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori.
We show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem.
To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $tildeO(T3/4)$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $tildeO(Tilde
- Score: 29.648462942501084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network Utility Maximization (NUM) studies the problems of allocating traffic
rates to network users in order to maximize the users' total utility subject to
network resource constraints. In this paper, we propose a new NUM framework,
Learning-NUM, where the users' utility functions are unknown apriori and the
utility function values of the traffic rates can be observed only after the
corresponding traffic is delivered to the destination, which means that the
utility feedback experiences \textit{queueing delay}.
The goal is to design a policy that gradually learns the utility functions
and makes rate allocation and network scheduling/routing decisions so as to
maximize the total utility obtained over a finite time horizon $T$. In addition
to unknown utility functions and stochastic constraints, a central challenge of
our problem lies in the queueing delay of the observations, which may be
unbounded and depends on the decisions of the policy.
We first show that the expected total utility obtained by the best dynamic
policy is upper bounded by the solution to a static optimization problem.
Without the presence of feedback delay, we design an algorithm based on the
ideas of gradient estimation and Max-Weight scheduling. To handle the feedback
delay, we embed the algorithm in a parallel-instance paradigm to form a policy
that achieves $\tilde{O}(T^{3/4})$-regret, i.e., the difference between the
expected utility obtained by the best dynamic policy and our policy is in
$\tilde{O}(T^{3/4})$. Finally, to demonstrate the practical applicability of
the Learning-NUM framework, we apply it to three application scenarios
including database query, job scheduling and video streaming. We further
conduct simulations on the job scheduling application to evaluate the empirical
performance of our policy.
Related papers
- Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks [35.78834550608041]
Classical SNO algorithms require network conditions to be stationary with time.
Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback.
Our proposed UMO2 algorithm ensures network stability and also matches the utility performance of any "mildly varying" reference policy.
arXiv Detail & Related papers (2024-08-29T02:18:28Z) - Neural Quantile Optimization for Edge-Cloud Networking [13.509945075582447]
We seek the best traffic allocation scheme for the edge-cloud computing network that satisfies constraints and minimizes the cost based on burstable billing.
We introduce the Gumbel-softmax sampling network to solve the optimization problems via unsupervised learning.
The trained network works as an efficient traffic allocation scheme sampler, remarkably outperforming the random strategy in feasibility and cost function value.
arXiv Detail & Related papers (2023-07-11T11:05:10Z) - Scheduling Inference Workloads on Distributed Edge Clusters with
Reinforcement Learning [11.007816552466952]
This paper focuses on the problem of scheduling inference queries on Deep Neural Networks in edge networks at short timescales.
By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP.
We design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions.
arXiv Detail & Related papers (2023-01-31T13:23:34Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Learning Augmented Index Policy for Optimal Service Placement at the
Network Edge [8.136957953239254]
We consider the problem of service placement at the network edge, in which a decision maker has to choose between $N$ services to host at the edge.
Our goal is to design adaptive algorithms to minimize the average service delivery latency for customers.
arXiv Detail & Related papers (2021-01-10T23:54:59Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance.
We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z) - Optimizing Wireless Systems Using Unsupervised and
Reinforced-Unsupervised Deep Learning [96.01176486957226]
Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems.
In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems.
arXiv Detail & Related papers (2020-01-03T11:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.