Related papers: Cache Replacement as a MAB with Delayed Feedback and Decaying Costs

Cache Replacement as a MAB with Delayed Feedback and Decaying Costs

URL: http://arxiv.org/abs/2009.11330v4
Date: Wed, 19 May 2021 04:32:39 GMT
Title: Cache Replacement as a MAB with Delayed Feedback and Decaying Costs
Authors: Farzana Beente Yusuf, Vitalii Stebliankin, Giuseppe Vietri, Giri Narasimhan
Abstract summary: We propose and solve a new variant of the well-known multi-armed bandit (MAB) Each arm represents a distinct cache replacement policy, which advises on the page to evict from the cache when needed. We introduce an adaptive reinforcement learning algorithm EXP4-DFDC that provides a solution to the problem.
Score: 4.358626952482686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inspired by the cache replacement problem, we propose and solve a new variant of the well-known multi-armed bandit (MAB), thus providing a solution for improving existing state-of-the-art cache management methods. Each arm (or expert) represents a distinct cache replacement policy, which advises on the page to evict from the cache when needed. Feedback on the eviction comes in the form of a "miss", but at an indeterminate time after the action is taken, and the cost of the eviction is set to be inversely proportional to the response time. The feedback is ignored if it comes after a threshold value for the delay, which we set to be equal to the size of the page eviction history. Thus, for delays beyond the threshold, its cost is assumed to be zero. Consequently, we call this problem with delayed feedback and decaying costs. We introduce an adaptive reinforcement learning algorithm EXP4-DFDC that provides a solution to the problem. We derive an optimal learning rate for EXP4-DFDC that defines the balance between exploration and exploitation and proves theoretically that the expected regret of our algorithm is a vanishing quantity as a function of time. As an application, we show that LeCaR, a recent top-performing machine learning algorithm for cache replacement, can be enhanced with adaptive learning using our formulations. We present an improved adaptive version of LeCaR, called OLeCaR, with the learning rate set as determined by the theoretical derivation presented here to minimize regret for EXP4-DFDC. It then follows that LeCaR and OLeCaR are theoretically guaranteed to have vanishing regret over time.

Related papers

Decision from Suboptimal Classifiers: Excess Risk Pre- and Post-Calibration [52.70324949884702]
We quantify the excess risk incurred using approximate posterior probabilities in batch binary decision-making. We identify regimes where recalibration alone addresses most of the regret, and regimes where the regret is dominated by the grouping loss. On NLP experiments, we show that these quantities identify when the expected gain of more advanced post-training is worth the operational cost.
arXiv Detail & Related papers (2025-03-23T10:52:36Z)
RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems [22.87458184871264]
This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. We propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages.
arXiv Detail & Related papers (2024-09-20T03:02:42Z)
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference [19.447729423696096]
Large Language Models have excelled in various fields but encounter challenges in memory and time efficiency. Recent efforts try to reduce KV cache size to a given memory budget by evicting vast non-critical cache elements during runtime.
arXiv Detail & Related papers (2024-07-16T09:53:32Z)
Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems [10.52021139266752]
cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache. CARL can significantly improve the users' engagement when considering the result cache. CARL has been fully launched in Kwai app, serving over 100 million users.
arXiv Detail & Related papers (2024-04-23T12:06:40Z)
No-Regret Caching with Noisy Request Estimates [12.603423174002254]
We propose the Noisy-Follow-the-Perturbed-Leader (NFPL) algorithm, a variant of the classic Follow-the-Perturbed-Leader (FPL) when request estimates are noisy. We show that the proposed solution has sublinear regret under specific conditions on the requests estimator.
arXiv Detail & Related papers (2023-09-05T08:57:35Z)
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z)
Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)
ARCH: Efficient Adversarial Regularized Training with Caching [91.74682538906691]
Adversarial regularization can improve model generalization in many natural language processing tasks. We propose a new adversarial regularization method ARCH, where perturbations are generated and cached once every several epochs. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks.
arXiv Detail & Related papers (2021-09-15T02:05:37Z)
Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget. We consider multi-armed bandits, linear bandits, and reinforcement learning problems. We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
No-Regret Caching via Online Mirror Descent [0.0]
We study an online caching problem in which requests can be served by a local cache to avoid retrieval costs from a remote server. We show that bounds for the regret crucially depend on the diversity of the request process, provided by the diversity ratio R/h. We also prove that, when the cache must store the entire file, rather than a fraction, OMD strategies can be coupled with a randomized rounding scheme that preserves regret guarantees.
arXiv Detail & Related papers (2021-01-29T13:56:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.