Efficient and Optimal No-Regret Caching under Partial Observation
- URL: http://arxiv.org/abs/2503.02758v1
- Date: Tue, 04 Mar 2025 16:21:33 GMT
- Title: Efficient and Optimal No-Regret Caching under Partial Observation
- Authors: Younes Ben Mazziane, Francescomaria Faticanti, Sara Alouf, Giovanni Neglia,
- Abstract summary: We study the caching problem in a more restrictive setting where only a fraction of past requests are observed.<n>We propose a randomized caching policy with sublinear regret based on classic online learning algorithm Follow-the-Perturbed-Leader.
- Score: 11.537072761243344
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Online learning algorithms have been successfully used to design caching policies with sublinear regret in the total number of requests, with no statistical assumption about the request sequence. Most existing algorithms involve computationally expensive operations and require knowledge of all past requests. However, this may not be feasible in practical scenarios like caching at a cellular base station. Therefore, we study the caching problem in a more restrictive setting where only a fraction of past requests are observed, and we propose a randomized caching policy with sublinear regret based on the classic online learning algorithm Follow-the-Perturbed-Leader (FPL). Our caching policy is the first to attain the asymptotically optimal regret bound while ensuring asymptotically constant amortized time complexity in the partial observability setting of requests. The experimental evaluation compares the proposed solution against classic caching policies and validates the proposed approach under synthetic and real-world request traces.
Related papers
- On the Regret of Coded Caching with Adversarial Requests [7.171698704686835]
We study the well-known coded caching problem in an online learning framework, wherein requests arrive sequentially.
We introduce a caching policy based on the Follow-The-Perturbed-Leader principle and show that for any time horizon T, it achieves a sub-linear regret of mathcalO(sqrt(T).
arXiv Detail & Related papers (2024-09-19T01:13:03Z) - An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees [13.844896723580858]
We introduce a new variant of the gradient-based online caching policy that achieves groundbreaking logarithmic computational complexity.
This advancement allows us to test the policy on large-scale, real-world traces featuring millions of requests and items.
arXiv Detail & Related papers (2024-05-02T13:11:53Z) - No-Regret Caching with Noisy Request Estimates [12.603423174002254]
We propose the Noisy-Follow-the-Perturbed-Leader (NFPL) algorithm, a variant of the classic Follow-the-Perturbed-Leader (FPL) when request estimates are noisy.
We show that the proposed solution has sublinear regret under specific conditions on the requests estimator.
arXiv Detail & Related papers (2023-09-05T08:57:35Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Optimistic No-regret Algorithms for Discrete Caching [6.182368229968862]
We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning.
We provide a universal lower bound for prediction-assisted online caching and design a suite of policies with a range of performance-complexity trade-offs.
Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only $O(sqrtT)$ regret.
arXiv Detail & Related papers (2022-08-15T09:18:41Z) - Fast Offline Policy Optimization for Large Scale Recommendation [74.78213147859236]
We derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size.
Our contribution is based upon combining three novel ideas.
Our estimator is an order of magnitude faster than naive approaches yet produces equally good policies.
arXiv Detail & Related papers (2022-08-08T11:54:11Z) - Universal Caching [8.208569626646034]
We investigate a stronger notion of regret minimization in the context of an online caching problem.
We propose an efficient online caching policy with an adaptive sub-linear regret bound.
To the best of our knowledge, this is the first data-dependent regret bound known for the universal caching problem.
arXiv Detail & Related papers (2022-05-10T13:00:10Z) - Accelerating Deep Learning Classification with Error-controlled
Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching.
While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error.
We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance.
We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.