Related papers: Edge Caching Optimization with PPO and Transfer Learning for Dynamic Environments

Edge Caching Optimization with PPO and Transfer Learning for Dynamic Environments

URL: http://arxiv.org/abs/2411.09812v1
Date: Thu, 14 Nov 2024 21:01:29 GMT
Title: Edge Caching Optimization with PPO and Transfer Learning for Dynamic Environments
Authors: Farnaz Niknia, Ping Wang,
Abstract summary: In dynamic environments, changes in content popularity and variations in request rates frequently occur, making previously learned policies less effective as they were optimized for earlier conditions. We develop a mechanism that detects changes in content popularity and request rates, ensuring timely adjustments to the caching strategy. We also propose a transfer learning-based PPO algorithm that accelerates convergence in new environments by leveraging prior knowledge.
Score: 3.720975664058743
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper addresses the challenge of edge caching in dynamic environments, where rising traffic loads strain backhaul links and core networks. We propose a Proximal Policy Optimization (PPO)-based caching strategy that fully incorporates key file attributes such as size, lifetime, importance, and popularity, while also considering random file request arrivals, reflecting more realistic edge caching scenarios. In dynamic environments, changes such as shifts in content popularity and variations in request rates frequently occur, making previously learned policies less effective as they were optimized for earlier conditions. Without adaptation, caching efficiency and response times can degrade. While learning a new policy from scratch in a new environment is an option, it is highly inefficient and computationally expensive. Thus, adapting an existing policy to these changes is critical. To address this, we develop a mechanism that detects changes in content popularity and request rates, ensuring timely adjustments to the caching strategy. We also propose a transfer learning-based PPO algorithm that accelerates convergence in new environments by leveraging prior knowledge. Simulation results demonstrate the significant effectiveness of our approach, outperforming a recent Deep Reinforcement Learning (DRL)-based method.

Related papers

Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.<n>Current approaches typically address this issue through online sampling from the target policy.<n>We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z)
An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees [13.844896723580858]
We introduce a new variant of the gradient-based online caching policy that achieves groundbreaking logarithmic computational complexity. This advancement allows us to test the policy on large-scale, real-world traces featuring millions of requests and items.
arXiv Detail & Related papers (2024-05-02T13:11:53Z)
Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching [4.2579244769567675]
We introduce a Proximal Policy Optimization (PPO)-based caching strategy that fully considers file attributes like lifetime, size, and priority. Our method outperforms a recent Deep Reinforcement Learning-based technique.
arXiv Detail & Related papers (2024-02-08T17:17:46Z)
Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change. We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z)
A Parametric Class of Approximate Gradient Updates for Policy Optimization [47.69337420768319]
We develop a unified perspective that re-expresses the underlying updates in terms of a limited choice of gradient form and scaling function. We obtain novel yet well motivated updates that generalize existing algorithms in a way that can deliver benefits both in terms of convergence speed and final result quality.
arXiv Detail & Related papers (2022-06-17T01:28:38Z)
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting. We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Near Optimal Policy Optimization via REPS [33.992374484681704]
emphrelative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains. There exist no guarantees on REPS's performance when using gradient-based solvers. We introduce a technique that uses emphgenerative access to the underlying decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
arXiv Detail & Related papers (2021-03-17T16:22:59Z)
Cocktail Edge Caching: Ride Dynamic Trends of Content Popularity with Ensemble Learning [10.930268276150262]
Edge caching will play a critical role in facilitating the emerging content-rich applications. It faces many new challenges, in particular, the highly dynamic content popularity and the heterogeneous caching computation. We propose Cocktail Edge Caching, that tackles the dynamic popularity and heterogeneity through ensemble learning.
arXiv Detail & Related papers (2021-01-14T21:59:04Z)
Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control. From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly. We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z)
Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments [11.076005074172516]
We propose a systematic incremental learning method for Evolution strategies (ES) in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. This paper introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.
arXiv Detail & Related papers (2020-10-09T14:31:44Z)
Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks. To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.