An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees
- URL: http://arxiv.org/abs/2405.01263v2
- Date: Mon, 17 Jun 2024 12:01:01 GMT
- Title: An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees
- Authors: Damiano Carra, Giovanni Neglia,
- Abstract summary: We introduce a new variant of the gradient-based online caching policy that achieves groundbreaking logarithmic computational complexity.
This advancement allows us to test the policy on large-scale, real-world traces featuring millions of requests and items.
- Score: 13.844896723580858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Commonly used caching policies, such as LRU (Least Recently Used) or LFU (Least Frequently Used), exhibit optimal performance only under specific traffic patterns. Even advanced machine learning-based methods, which detect patterns in historical request data, struggle when future requests deviate from past trends. Recently, a new class of policies has emerged that are robust to varying traffic patterns. These algorithms address an online optimization problem, enabling continuous adaptation to the context. They offer theoretical guarantees on the regret metric, which measures the performance gap between the online policy and the optimal static cache allocation in hindsight. However, the high computational complexity of these solutions hinders their practical adoption. In this study, we introduce a new variant of the gradient-based online caching policy that achieves groundbreaking logarithmic computational complexity relative to catalog size, while also providing regret guarantees. This advancement allows us to test the policy on large-scale, real-world traces featuring millions of requests and items - a significant achievement, as such scales have been beyond the reach of existing policies with regret guarantees. To the best of our knowledge, our experimental results demonstrate for the first time that the regret guarantees of gradient-based caching policies offer substantial benefits in practical scenarios.
Related papers
- Edge Caching Optimization with PPO and Transfer Learning for Dynamic Environments [3.720975664058743]
In dynamic environments, changes in content popularity and variations in request rates frequently occur, making previously learned policies less effective as they were optimized for earlier conditions.
We develop a mechanism that detects changes in content popularity and request rates, ensuring timely adjustments to the caching strategy.
We also propose a transfer learning-based PPO algorithm that accelerates convergence in new environments by leveraging prior knowledge.
arXiv Detail & Related papers (2024-11-14T21:01:29Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Optimistic No-regret Algorithms for Discrete Caching [6.182368229968862]
We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning.
We provide a universal lower bound for prediction-assisted online caching and design a suite of policies with a range of performance-complexity trade-offs.
Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only $O(sqrtT)$ regret.
arXiv Detail & Related papers (2022-08-15T09:18:41Z) - Universal Caching [8.208569626646034]
We investigate a stronger notion of regret minimization in the context of an online caching problem.
We propose an efficient online caching policy with an adaptive sub-linear regret bound.
To the best of our knowledge, this is the first data-dependent regret bound known for the universal caching problem.
arXiv Detail & Related papers (2022-05-10T13:00:10Z) - Online Caching with no Regret: Optimistic Learning via Recommendations [15.877673959068458]
We build upon the Follow-the-Regularized-Leader (FTRL) framework to include predictions for the file requests.
We extend the framework to learn and utilize the best request predictor in cases where many are available.
We prove that the proposed optimistic learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions.
arXiv Detail & Related papers (2022-04-20T09:29:47Z) - MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch
Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical.
We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization.
Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance.
We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.