LeadCache: Regret-Optimal Caching in Networks
- URL: http://arxiv.org/abs/2009.08228v4
- Date: Tue, 26 Oct 2021 13:59:48 GMT
- Title: LeadCache: Regret-Optimal Caching in Networks
- Authors: Debjit Paria, Abhishek Sinha
- Abstract summary: We propose an efficient online caching policy based on the Follow-the-Perturbed-Leader paradigm.
We show that $textttLeadCache$ is regret-optimal up to a factor $tildeO(n3/8), where $n$ is the number of users.
- Score: 8.208569626646034
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider an online prediction problem in the context of network caching.
Assume that multiple users are connected to several caches via a bipartite
network. At any time slot, each user may request an arbitrary file chosen from
a large catalog. A user's request at a slot is met if the requested file is
cached in at least one of the caches connected to the user. Our objective is to
predict, prefetch, and optimally distribute the files on the caches at each
slot to maximize the total number of cache hits. The problem is non-trivial due
to the non-convex and non-smooth nature of the objective function. In this
paper, we propose $\texttt{LeadCache}$ - an efficient online caching policy
based on the Follow-the-Perturbed-Leader paradigm. We show that
$\texttt{LeadCache}$ is regret-optimal up to a factor of $\tilde{O}(n^{3/8}),$
where $n$ is the number of users. We design two efficient implementations of
the $\texttt{LeadCache}$ policy, one based on Pipage rounding and the other
based on Madow's sampling, each of which makes precisely one call to an
LP-solver per iteration. Furthermore, with a Strong-Law-type assumption, we
show that the total number of file fetches under $\texttt{LeadCache}$ remains
almost surely finite over an infinite horizon. Finally, we derive an
approximately tight regret lower bound using results from graph coloring. We
conclude that the learning-based $\texttt{LeadCache}$ policy decisively
outperforms the state-of-the-art caching policies both theoretically and
empirically.
Related papers
- InstCache: A Predictive Cache for LLM Serving [9.878166964839512]
We propose to predict user-instructions by an instruction-aligned LLM and store them in a predictive cache, so-called InstCache.
Experimental results show that InstCache can achieve up to 51.34% hit rate on LMSys dataset, which corresponds to a 2x speedup, at a memory cost of only 4.5GB.
arXiv Detail & Related papers (2024-11-21T03:52:41Z) - Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models.
We propose an importance-driven cache merging strategy to prune redundancy caches.
For instruction encoding, we utilize the frequency to evaluate the importance of caches.
Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z) - MeanCache: User-Centric Semantic Cache for Large Language Model Based Web Services [8.350378532274405]
Caching is a natural solution to reduce inference costs on repeated queries.
This paper introduces MeanCache, a user-centric semantic cache for LLM-based services.
MeanCache identifies semantically similar queries to determine cache hit or miss.
arXiv Detail & Related papers (2024-03-05T06:23:50Z) - Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference [78.65321721142624]
We focus on a memory bottleneck imposed by the key-value ( KV) cache.
Existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs.
We propose LESS, a simple integration of a constant sized cache with eviction-based cache methods.
arXiv Detail & Related papers (2024-02-14T18:54:56Z) - On Optimal Caching and Model Multiplexing for Large Model Inference [66.50550915522551]
Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges.
We study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing.
arXiv Detail & Related papers (2023-06-03T05:01:51Z) - Optimistic No-regret Algorithms for Discrete Caching [6.182368229968862]
We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning.
We provide a universal lower bound for prediction-assisted online caching and design a suite of policies with a range of performance-complexity trade-offs.
Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only $O(sqrtT)$ regret.
arXiv Detail & Related papers (2022-08-15T09:18:41Z) - Accelerating Deep Learning Classification with Error-controlled
Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching.
While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error.
We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z) - Online Caching with Optimal Switching Regret [10.447270433913134]
A cache of limited storage capacity can hold $C$ files at a time from a large catalog.
In the case of a cache-hit, the policy receives a unit reward and zero rewards otherwise.
The objective is to design a caching policy that incurs minimal regret while considering both the rewards due to cache-hits and the switching cost due to the file fetches.
arXiv Detail & Related papers (2021-01-18T12:47:22Z) - Quantum Differentially Private Sparse Regression Learning [132.1981461292324]
We devise an efficient quantum differentially private (QDP) Lasso estimator to solve sparse regression tasks.
Last, we exhibit that the QDP Lasso attains a near-optimal utility bound $tildeO(N-2/3)$ with privacy guarantees.
arXiv Detail & Related papers (2020-07-23T10:50:42Z) - Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks.
To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache.
This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z) - PA-Cache: Evolving Learning-Based Popularity-Aware Content Caching in
Edge Networks [14.939950326112045]
We propose an evolving learning-based content caching policy, named PA-Cache in edge networks.
It adaptively learns time-varying content popularity and determines which contents should be replaced when the cache is full.
We extensively evaluate the performance of our proposed PA-Cache on real-world traces from a large online video-on-demand service provider.
arXiv Detail & Related papers (2020-02-20T15:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.