Related papers: Learning Augmented Index Policy for Optimal Service Placement at the Network Edge

Learning Augmented Index Policy for Optimal Service Placement at the Network Edge

URL: http://arxiv.org/abs/2101.03641v2
Date: Thu, 14 Jan 2021 04:01:37 GMT
Title: Learning Augmented Index Policy for Optimal Service Placement at the Network Edge
Authors: Guojun Xiong, Rahul Singh, Jian Li
Abstract summary: We consider the problem of service placement at the network edge, in which a decision maker has to choose between $N$ services to host at the edge. Our goal is to design adaptive algorithms to minimize the average service delivery latency for customers.
Score: 8.136957953239254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of service placement at the network edge, in which a decision maker has to choose between $N$ services to host at the edge to satisfy the demands of customers. Our goal is to design adaptive algorithms to minimize the average service delivery latency for customers. We pose the problem as a Markov decision process (MDP) in which the system state is given by describing, for each service, the number of customers that are currently waiting at the edge to obtain the service. However, solving this $N$-services MDP is computationally expensive due to the curse of dimensionality. To overcome this challenge, we show that the optimal policy for a single-service MDP has an appealing threshold structure, and derive explicitly the Whittle indices for each service as a function of the number of requests from customers based on the theory of Whittle index policy. Since request arrival and service delivery rates are usually unknown and possibly time-varying, we then develop efficient learning augmented algorithms that fully utilize the structure of optimal policies with a low learning regret. The first of these is UCB-Whittle, and relies upon the principle of optimism in the face of uncertainty. The second algorithm, Q-learning-Whittle, utilizes Q-learning iterations for each service by using a two time scale stochastic approximation. We characterize the non-asymptotic performance of UCB-Whittle by analyzing its learning regret, and also analyze the convergence properties of Q-learning-Whittle. Simulation results show that the proposed policies yield excellent empirical performance.

Related papers

Slicing for AI: An Online Learning Framework for Network Slicing Supporting AI Services [5.80147190706865]
6G networks will embrace a new realm of AI-driven services that requires innovative network slicing strategies. This paper proposes an online learning framework to optimize the allocation of computational and communication resources to AI services.
arXiv Detail & Related papers (2024-10-20T14:38:54Z)
Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$. These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z)
FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning [57.38427653043984]
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients. We introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. We demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance.
arXiv Detail & Related papers (2024-05-20T06:12:33Z)
Online Learning and Optimization for Queues with Unknown Demand Curve and Service Distribution [26.720986177499338]
We investigate an optimization problem in a queueing system where the service provider selects the optimal service fee p and service capacity mu. We develop an online learning framework that automatically incorporates the parameter estimation errors in the solution prescription process.
arXiv Detail & Related papers (2023-03-06T08:47:40Z)
Differentially Private Deep Q-Learning for Pattern Privacy Preservation in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns. We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP) We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
Online Learning for Orchestration of Inference in Multi-User End-Edge-Cloud Networks [3.6076391721440633]
Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency. We propose a reinforcement-learning-based computation offloading solution that learns optimal offloading policy. Our solution provides 35% speedup in the average response time compared to the state-of-the-art with less than 0.9% accuracy reduction.
arXiv Detail & Related papers (2022-02-21T21:41:29Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay [29.648462942501084]
We propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori. We show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem. To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $tildeO(T3/4)$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $tildeO(Tilde
arXiv Detail & Related papers (2020-12-16T19:36:25Z)
An online learning approach to dynamic pricing and capacity sizing in service systems [26.720986177499338]
We study a dynamic pricing and capacity sizing problem in a $GI/GI/1$ queue. Our framework is dubbed Gradient-based Online Learning in Queue (GOLiQ)
arXiv Detail & Related papers (2020-09-07T07:17:20Z)
Optimistic Exploration even with a Pessimistic Initialisation [57.41327865257504]
Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL) In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values. We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network.
arXiv Detail & Related papers (2020-02-26T17:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.