Related papers: Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching

Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching

URL: http://arxiv.org/abs/2402.14576v3
Date: Wed, 30 Oct 2024 16:06:21 GMT
Title: Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching
Authors: Farnaz Niknia, Ping Wang, Zixu Wang, Aakash Agarwal, Adib S. Rezaei,
Abstract summary: We introduce a Proximal Policy Optimization (PPO)-based caching strategy that fully considers file attributes like lifetime, size, and priority. Our method outperforms a recent Deep Reinforcement Learning-based technique.
Score: 4.2579244769567675
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper tackles the growing issue of excessive data transmission in networks. With increasing traffic, backhaul links and core networks are under significant traffic, leading to the investigation of caching solutions at edge routers. Many existing studies utilize Markov Decision Processes (MDP) to tackle caching problems, often assuming decision points at fixed intervals; however, real-world environments are characterized by random request arrivals. Additionally, critical file attributes such as lifetime, size, and priority significantly impact the effectiveness of caching policies, yet existing research fails to integrate all these attributes in policy design. In this work, we model the caching problem using a Semi-Markov Decision Process (SMDP) to better capture the continuous-time nature of real-world applications, enabling caching decisions to be triggered by random file requests. We then introduce a Proximal Policy Optimization (PPO)--based caching strategy that fully considers file attributes like lifetime, size, and priority. Simulations show that our method outperforms a recent Deep Reinforcement Learning-based technique. To further advance our research, we improved the convergence rate of PPO by prioritizing transitions within the replay buffer through an attention mechanism. This mechanism evaluates the similarity between the current state and all stored transitions, assigning higher priorities to transitions that exhibit greater similarity.

Related papers

Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.<n>Current approaches typically address this issue through online sampling from the target policy.<n>We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z)
CacheMamba: Popularity Prediction for Mobile Edge Caching Networks via Selective State Spaces [6.895209729810318]
Mobile Edge Caching (MEC) plays a pivotal role in mitigating latency in data-intensive services by dynamically caching frequently requested content on edge servers. In this paper, we explore the problem of popularity prediction in MEC by utilizing historical time-series request data of intended files. We propose CacheMamba model by employing Mamba, a state-space model (SSM)-based architecture, to identify the top-K files with the highest likelihood of being requested.
arXiv Detail & Related papers (2025-02-09T05:57:59Z)
Edge Caching Optimization with PPO and Transfer Learning for Dynamic Environments [3.720975664058743]
In dynamic environments, changes in content popularity and variations in request rates frequently occur, making previously learned policies less effective as they were optimized for earlier conditions. We develop a mechanism that detects changes in content popularity and request rates, ensuring timely adjustments to the caching strategy. We also propose a transfer learning-based PPO algorithm that accelerates convergence in new environments by leveraging prior knowledge.
arXiv Detail & Related papers (2024-11-14T21:01:29Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees [13.844896723580858]
We introduce a new variant of the gradient-based online caching policy that achieves groundbreaking logarithmic computational complexity. This advancement allows us to test the policy on large-scale, real-world traces featuring millions of requests and items.
arXiv Detail & Related papers (2024-05-02T13:11:53Z)
A Learning-Based Caching Mechanism for Edge Content Delivery [2.412158290827225]
5G networks and the rise of the Internet of Things (IoT) are increasingly extending into the network edge. This shift introduces unique challenges, particularly due to the limited cache storage and the diverse request patterns at the edge. We introduce HR-Cache, a learning-based caching framework grounded in the principles of Hazard Rate (HR) ordering.
arXiv Detail & Related papers (2024-02-05T08:06:03Z)
SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization [50.04951511146338]
Mixed-precision quantization (MPQ) suffers from the time-consuming process of searching the optimal bit-width allocation for each layer. This paper proposes a novel method for efficiently searching for effective MPQ policies using a small proxy dataset.
arXiv Detail & Related papers (2023-02-14T05:47:45Z)
Optimistic No-regret Algorithms for Discrete Caching [6.182368229968862]
We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning. We provide a universal lower bound for prediction-assisted online caching and design a suite of policies with a range of performance-complexity trade-offs. Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only $O(sqrtT)$ regret.
arXiv Detail & Related papers (2022-08-15T09:18:41Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)
Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks. We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z)
Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control. From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly. We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z)
Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks. To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.