Learning Augmented Index Policy for Optimal Service Placement at the
Network Edge
- URL: http://arxiv.org/abs/2101.03641v2
- Date: Thu, 14 Jan 2021 04:01:37 GMT
- Title: Learning Augmented Index Policy for Optimal Service Placement at the
Network Edge
- Authors: Guojun Xiong, Rahul Singh, Jian Li
- Abstract summary: We consider the problem of service placement at the network edge, in which a decision maker has to choose between $N$ services to host at the edge.
Our goal is to design adaptive algorithms to minimize the average service delivery latency for customers.
- Score: 8.136957953239254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of service placement at the network edge, in which a
decision maker has to choose between $N$ services to host at the edge to
satisfy the demands of customers. Our goal is to design adaptive algorithms to
minimize the average service delivery latency for customers. We pose the
problem as a Markov decision process (MDP) in which the system state is given
by describing, for each service, the number of customers that are currently
waiting at the edge to obtain the service. However, solving this $N$-services
MDP is computationally expensive due to the curse of dimensionality. To
overcome this challenge, we show that the optimal policy for a single-service
MDP has an appealing threshold structure, and derive explicitly the Whittle
indices for each service as a function of the number of requests from customers
based on the theory of Whittle index policy.
Since request arrival and service delivery rates are usually unknown and
possibly time-varying, we then develop efficient learning augmented algorithms
that fully utilize the structure of optimal policies with a low learning
regret. The first of these is UCB-Whittle, and relies upon the principle of
optimism in the face of uncertainty. The second algorithm, Q-learning-Whittle,
utilizes Q-learning iterations for each service by using a two time scale
stochastic approximation. We characterize the non-asymptotic performance of
UCB-Whittle by analyzing its learning regret, and also analyze the convergence
properties of Q-learning-Whittle. Simulation results show that the proposed
policies yield excellent empirical performance.
Related papers
- Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning [57.38427653043984]
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients.
We introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge.
We demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance.
arXiv Detail & Related papers (2024-05-20T06:12:33Z) - Online Learning and Optimization for Queues with Unknown Demand Curve
and Service Distribution [26.720986177499338]
We investigate an optimization problem in a queueing system where the service provider selects the optimal service fee p and service capacity mu.
We develop an online learning framework that automatically incorporates the parameter estimation errors in the solution prescription process.
arXiv Detail & Related papers (2023-03-06T08:47:40Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Online Learning for Orchestration of Inference in Multi-User
End-Edge-Cloud Networks [3.6076391721440633]
Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency.
We propose a reinforcement-learning-based computation offloading solution that learns optimal offloading policy.
Our solution provides 35% speedup in the average response time compared to the state-of-the-art with less than 0.9% accuracy reduction.
arXiv Detail & Related papers (2022-02-21T21:41:29Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Learning-NUM: Network Utility Maximization with Unknown Utility
Functions and Queueing Delay [29.648462942501084]
We propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori.
We show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem.
To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $tildeO(T3/4)$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $tildeO(Tilde
arXiv Detail & Related papers (2020-12-16T19:36:25Z) - An online learning approach to dynamic pricing and capacity sizing in
service systems [26.720986177499338]
We study a dynamic pricing and capacity sizing problem in a $GI/GI/1$ queue.
Our framework is dubbed Gradient-based Online Learning in Queue (GOLiQ)
arXiv Detail & Related papers (2020-09-07T07:17:20Z) - Optimistic Exploration even with a Pessimistic Initialisation [57.41327865257504]
Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL)
In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values.
We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network.
arXiv Detail & Related papers (2020-02-26T17:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.