Learning-NUM: Network Utility Maximization with Unknown Utility
Functions and Queueing Delay
- URL: http://arxiv.org/abs/2012.09222v1
- Date: Wed, 16 Dec 2020 19:36:25 GMT
- Title: Learning-NUM: Network Utility Maximization with Unknown Utility
Functions and Queueing Delay
- Authors: Xinzhe Fu, Eytan Modiano
- Abstract summary: We propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori.
We show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem.
To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $tildeO(T3/4)$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $tildeO(Tilde
- Score: 29.648462942501084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network Utility Maximization (NUM) studies the problems of allocating traffic
rates to network users in order to maximize the users' total utility subject to
network resource constraints. In this paper, we propose a new NUM framework,
Learning-NUM, where the users' utility functions are unknown apriori and the
utility function values of the traffic rates can be observed only after the
corresponding traffic is delivered to the destination, which means that the
utility feedback experiences \textit{queueing delay}.
The goal is to design a policy that gradually learns the utility functions
and makes rate allocation and network scheduling/routing decisions so as to
maximize the total utility obtained over a finite time horizon $T$. In addition
to unknown utility functions and stochastic constraints, a central challenge of
our problem lies in the queueing delay of the observations, which may be
unbounded and depends on the decisions of the policy.
We first show that the expected total utility obtained by the best dynamic
policy is upper bounded by the solution to a static optimization problem.
Without the presence of feedback delay, we design an algorithm based on the
ideas of gradient estimation and Max-Weight scheduling. To handle the feedback
delay, we embed the algorithm in a parallel-instance paradigm to form a policy
that achieves $\tilde{O}(T^{3/4})$-regret, i.e., the difference between the
expected utility obtained by the best dynamic policy and our policy is in
$\tilde{O}(T^{3/4})$. Finally, to demonstrate the practical applicability of
the Learning-NUM framework, we apply it to three application scenarios
including database query, job scheduling and video streaming. We further
conduct simulations on the job scheduling application to evaluate the empirical
performance of our policy.
Related papers
- Can Graph Learning Improve Task Planning? [61.47027387839096]
Task planning is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning.
Our approach complements prompt engineering and fine-tuning techniques, with performance further enhanced by improved prompts or a fine-tuned model.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - Neural Quantile Optimization for Edge-Cloud Computing [17.57923005428252]
We seek the best traffic allocation scheme for the edge-cloud computing network that satisfies constraints and minimizes the cost based on burstable billing.
We introduce the Gumbel-softmax sampling network to solve the optimization problems via unsupervised learning.
The trained network works as an efficient traffic allocation scheme sampler, remarkably outperforming the random strategy in feasibility and cost function value.
arXiv Detail & Related papers (2023-07-11T11:05:10Z) - Scheduling Inference Workloads on Distributed Edge Clusters with
Reinforcement Learning [11.007816552466952]
This paper focuses on the problem of scheduling inference queries on Deep Neural Networks in edge networks at short timescales.
By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP.
We design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions.
arXiv Detail & Related papers (2023-01-31T13:23:34Z) - Network Utility Maximization with Unknown Utility Functions: A
Distributed, Data-Driven Bilevel Optimization Approach [25.47492126908931]
Existing solutions almost exclusively assume each user utility function is known and concave.
This paper seeks to answer the question: how to allocate resources when utility functions are unknown, even to the users?
We provide a new solution using a distributed and data-driven bilevel optimization approach.
arXiv Detail & Related papers (2023-01-04T19:50:39Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Learning Augmented Index Policy for Optimal Service Placement at the
Network Edge [8.136957953239254]
We consider the problem of service placement at the network edge, in which a decision maker has to choose between $N$ services to host at the edge.
Our goal is to design adaptive algorithms to minimize the average service delivery latency for customers.
arXiv Detail & Related papers (2021-01-10T23:54:59Z) - Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance.
We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z) - Optimizing Wireless Systems Using Unsupervised and
Reinforced-Unsupervised Deep Learning [96.01176486957226]
Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems.
In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems.
arXiv Detail & Related papers (2020-01-03T11:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.