Related papers: Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay

Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay

URL: http://arxiv.org/abs/2012.09222v1
Date: Wed, 16 Dec 2020 19:36:25 GMT
Title: Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay
Authors: Xinzhe Fu, Eytan Modiano
Abstract summary: We propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori. We show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem. To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $tildeO(T3/4)$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $tildeO(Tilde
Score: 29.648462942501084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Network Utility Maximization (NUM) studies the problems of allocating traffic rates to network users in order to maximize the users' total utility subject to network resource constraints. In this paper, we propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori and the utility function values of the traffic rates can be observed only after the corresponding traffic is delivered to the destination, which means that the utility feedback experiences \textit{queueing delay}. The goal is to design a policy that gradually learns the utility functions and makes rate allocation and network scheduling/routing decisions so as to maximize the total utility obtained over a finite time horizon $T$. In addition to unknown utility functions and stochastic constraints, a central challenge of our problem lies in the queueing delay of the observations, which may be unbounded and depends on the decisions of the policy. We first show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem. Without the presence of feedback delay, we design an algorithm based on the ideas of gradient estimation and Max-Weight scheduling. To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $\tilde{O}(T^{3/4})$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $\tilde{O}(T^{3/4})$. Finally, to demonstrate the practical applicability of the Learning-NUM framework, we apply it to three application scenarios including database query, job scheduling and video streaming. We further conduct simulations on the job scheduling application to evaluate the empirical performance of our policy.

Related papers

Quantile-Optimal Policy Learning under Unmeasured Confounding [55.72891849926314]
We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $alpha$-quantile for some $alpha in (0, 1)$.<n>Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset.
arXiv Detail & Related papers (2025-06-08T13:37:38Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks [35.78834550608041]
Classical SNO algorithms require network conditions to be stationary with time. Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback. Our proposed UMO2 algorithm ensures network stability and also matches the utility performance of any "mildly varying" reference policy.
arXiv Detail & Related papers (2024-08-29T02:18:28Z)
Neural Quantile Optimization for Edge-Cloud Networking [13.509945075582447]
We seek the best traffic allocation scheme for the edge-cloud computing network that satisfies constraints and minimizes the cost based on burstable billing. We introduce the Gumbel-softmax sampling network to solve the optimization problems via unsupervised learning. The trained network works as an efficient traffic allocation scheme sampler, remarkably outperforming the random strategy in feasibility and cost function value.
arXiv Detail & Related papers (2023-07-11T11:05:10Z)
Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning [11.007816552466952]
This paper focuses on the problem of scheduling inference queries on Deep Neural Networks in edge networks at short timescales. By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP. We design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions.
arXiv Detail & Related papers (2023-01-31T13:23:34Z)
Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response. We construct unbiased estimators for the policy-dependent estimand by a perturbation method. We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Learning Augmented Index Policy for Optimal Service Placement at the Network Edge [8.136957953239254]
We consider the problem of service placement at the network edge, in which a decision maker has to choose between $N$ services to host at the edge. Our goal is to design adaptive algorithms to minimize the average service delivery latency for customers.
arXiv Detail & Related papers (2021-01-10T23:54:59Z)
Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy. A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z)
Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)
Optimizing Wireless Systems Using Unsupervised and Reinforced-Unsupervised Deep Learning [96.01176486957226]
Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems. In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems.
arXiv Detail & Related papers (2020-01-03T11:01:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.