Cache Replacement as a MAB with Delayed Feedback and Decaying Costs
- URL: http://arxiv.org/abs/2009.11330v4
- Date: Wed, 19 May 2021 04:32:39 GMT
- Title: Cache Replacement as a MAB with Delayed Feedback and Decaying Costs
- Authors: Farzana Beente Yusuf, Vitalii Stebliankin, Giuseppe Vietri, Giri
Narasimhan
- Abstract summary: We propose and solve a new variant of the well-known multi-armed bandit (MAB)
Each arm represents a distinct cache replacement policy, which advises on the page to evict from the cache when needed.
We introduce an adaptive reinforcement learning algorithm EXP4-DFDC that provides a solution to the problem.
- Score: 4.358626952482686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the cache replacement problem, we propose and solve a new variant
of the well-known multi-armed bandit (MAB), thus providing a solution for
improving existing state-of-the-art cache management methods. Each arm (or
expert) represents a distinct cache replacement policy, which advises on the
page to evict from the cache when needed. Feedback on the eviction comes in the
form of a "miss", but at an indeterminate time after the action is taken, and
the cost of the eviction is set to be inversely proportional to the response
time. The feedback is ignored if it comes after a threshold value for the
delay, which we set to be equal to the size of the page eviction history. Thus,
for delays beyond the threshold, its cost is assumed to be zero. Consequently,
we call this problem with delayed feedback and decaying costs. We introduce an
adaptive reinforcement learning algorithm EXP4-DFDC that provides a solution to
the problem. We derive an optimal learning rate for EXP4-DFDC that defines the
balance between exploration and exploitation and proves theoretically that the
expected regret of our algorithm is a vanishing quantity as a function of time.
As an application, we show that LeCaR, a recent top-performing machine learning
algorithm for cache replacement, can be enhanced with adaptive learning using
our formulations. We present an improved adaptive version of LeCaR, called
OLeCaR, with the learning rate set as determined by the theoretical derivation
presented here to minimize regret for EXP4-DFDC. It then follows that LeCaR and
OLeCaR are theoretically guaranteed to have vanishing regret over time.
Related papers
- RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems [22.87458184871264]
This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation.
We propose a reinforcement prediction-allocation framework (RPAF) to address these issues.
RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages.
arXiv Detail & Related papers (2024-09-20T03:02:42Z) - Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference [19.447729423696096]
Large Language Models have excelled in various fields but encounter challenges in memory and time efficiency.
Recent efforts try to reduce KV cache size to a given memory budget by evicting vast non-critical cache elements during runtime.
arXiv Detail & Related papers (2024-07-16T09:53:32Z) - Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems [10.52021139266752]
cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache.
CARL can significantly improve the users' engagement when considering the result cache.
CARL has been fully launched in Kwai app, serving over 100 million users.
arXiv Detail & Related papers (2024-04-23T12:06:40Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Accelerating Deep Learning Classification with Error-controlled
Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching.
While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error.
We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z) - ARCH: Efficient Adversarial Regularized Training with Caching [91.74682538906691]
Adversarial regularization can improve model generalization in many natural language processing tasks.
We propose a new adversarial regularization method ARCH, where perturbations are generated and cached once every several epochs.
We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks.
arXiv Detail & Related papers (2021-09-15T02:05:37Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - No-Regret Caching via Online Mirror Descent [0.0]
We study an online caching problem in which requests can be served by a local cache to avoid retrieval costs from a remote server.
We show that bounds for the regret crucially depend on the diversity of the request process, provided by the diversity ratio R/h.
We also prove that, when the cache must store the entire file, rather than a fraction, OMD strategies can be coupled with a randomized rounding scheme that preserves regret guarantees.
arXiv Detail & Related papers (2021-01-29T13:56:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.