An Imitation Learning Approach for Cache Replacement
- URL: http://arxiv.org/abs/2006.16239v2
- Date: Thu, 9 Jul 2020 21:13:34 GMT
- Title: An Imitation Learning Approach for Cache Replacement
- Authors: Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy
Ranganathan, Junwhan Ahn
- Abstract summary: We propose an imitation learning approach to automatically learn cache access patterns.
We train a policy conditioned only on past accesses that accurately approximates Belady's even on diverse and complex access patterns.
Parrot increases cache miss rates by 20% over the current state of the art.
- Score: 23.03767134871606
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Program execution speed critically depends on increasing cache hits, as cache
hits are orders of magnitude faster than misses. To increase cache hits, we
focus on the problem of cache replacement: choosing which cache line to evict
upon inserting a new line. This is challenging because it requires planning far
ahead and currently there is no known practical solution. As a result, current
replacement policies typically resort to heuristics designed for specific
common access patterns, which fail on more diverse and complex access patterns.
In contrast, we propose an imitation learning approach to automatically learn
cache access patterns by leveraging Belady's, an oracle policy that computes
the optimal eviction decision given the future cache accesses. While directly
applying Belady's is infeasible since the future is unknown, we train a policy
conditioned only on past accesses that accurately approximates Belady's even on
diverse and complex access patterns, and call this approach Parrot. When
evaluated on 13 of the most memory-intensive SPEC applications, Parrot
increases cache miss rates by 20% over the current state of the art. In
addition, on a large-scale web search benchmark, Parrot increases cache hit
rates by 61% over a conventional LRU policy. We release a Gym environment to
facilitate research in this area, as data is plentiful, and further
advancements can have significant real-world impact.
Related papers
- On the Regret of Coded Caching with Adversarial Requests [7.171698704686835]
We study the well-known coded caching problem in an online learning framework, wherein requests arrive sequentially.
We introduce a caching policy based on the Follow-The-Perturbed-Leader principle and show that for any time horizon T, it achieves a sub-linear regret of mathcalO(sqrt(T).
arXiv Detail & Related papers (2024-09-19T01:13:03Z) - Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models.
We propose an importance-driven cache merging strategy to prune redundancy caches.
For instruction encoding, we utilize the frequency to evaluate the importance of caches.
Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z) - Training-Free Exponential Context Extension via Cascading KV Cache [49.608367376911694]
We introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens.
Our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens.
arXiv Detail & Related papers (2024-06-24T03:59:17Z) - CORM: Cache Optimization with Recent Message for Large Language Model Inference [57.109354287786154]
We introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint.
CORM, a KV cache eviction policy, dynamically retains essential key-value pairs for inference without the need for model fine-tuning.
Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70% with negligible performance degradation across six tasks in LongBench.
arXiv Detail & Related papers (2024-04-24T16:11:54Z) - A Learning-Based Caching Mechanism for Edge Content Delivery [2.412158290827225]
5G networks and the rise of the Internet of Things (IoT) are increasingly extending into the network edge.
This shift introduces unique challenges, particularly due to the limited cache storage and the diverse request patterns at the edge.
We introduce HR-Cache, a learning-based caching framework grounded in the principles of Hazard Rate (HR) ordering.
arXiv Detail & Related papers (2024-02-05T08:06:03Z) - Optimistic No-regret Algorithms for Discrete Caching [6.182368229968862]
We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning.
We provide a universal lower bound for prediction-assisted online caching and design a suite of policies with a range of performance-complexity trade-offs.
Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only $O(sqrtT)$ regret.
arXiv Detail & Related papers (2022-08-15T09:18:41Z) - Accelerating Deep Learning Classification with Error-controlled
Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching.
While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error.
We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z) - ARCH: Efficient Adversarial Regularized Training with Caching [91.74682538906691]
Adversarial regularization can improve model generalization in many natural language processing tasks.
We propose a new adversarial regularization method ARCH, where perturbations are generated and cached once every several epochs.
We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks.
arXiv Detail & Related papers (2021-09-15T02:05:37Z) - Learning Forward Reuse Distance [1.8777512961936749]
Recent advancement of deep learning techniques enables the design of novel intelligent cache replacement policies.
We find that a powerful LSTM-based recurrent neural network model can provide high prediction accuracy based on only a cache trace as input.
Results demonstrate that the new cache policy improves state-of-art practical policies by up to 19.2% and incurs only 2.3% higher miss ratio than OPT on average.
arXiv Detail & Related papers (2020-07-31T05:57:50Z) - Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks.
To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache.
This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.