Related papers: Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems

Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems

URL: http://arxiv.org/abs/2404.14961v1
Date: Tue, 23 Apr 2024 12:06:40 GMT
Title: Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems
Authors: Xiaoshuang Chen, Gengrui Zhang, Yao Wang, Yulin Wu, Shuo Su, Kaiqiao Zhan, Ben Wang,
Abstract summary: cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache. CARL can significantly improve the users' engagement when considering the result cache. CARL has been fully launched in Kwai app, serving over 100 million users.
Score: 10.52021139266752
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern large-scale recommender systems are built upon computation-intensive infrastructure and usually suffer from a huge difference in traffic between peak and off-peak periods. In peak periods, it is challenging to perform real-time computation for each request due to the limited budget of computational resources. The recommendation with a cache is a solution to this problem, where a user-wise result cache is used to provide recommendations when the recommender system cannot afford a real-time computation. However, the cached recommendations are usually suboptimal compared to real-time computation, and it is challenging to determine the items in the cache for each user. In this paper, we provide a cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache. We formulate the problem as a Markov decision process with user states and a cache state, where the cache state represents whether the recommender system performs recommendations by real-time computation or by the cache. The computational load of the recommender system determines the cache state. We perform reinforcement learning based on such a model to improve user engagement over multiple requests. Moreover, we show that the cache will introduce a challenge called critic dependency, which deteriorates the performance of reinforcement learning. To tackle this challenge, we propose an eigenfunction learning (EL) method to learn independent critics for CARL. Experiments show that CARL can significantly improve the users' engagement when considering the result cache. CARL has been fully launched in Kwai app, serving over 100 million users.

Related papers

A Generative Caching System for Large Language Models [1.2132389187658934]
Caching has the potential to be of significant benefit for accessing large language models (LLMs) This paper presents a new caching system for improving user experiences with LLMs. A key feature we provide is generative caching, wherein multiple cached responses can be synthesized to provide answers to queries which have never been seen before.
arXiv Detail & Related papers (2025-03-22T01:17:56Z)
Compute Or Load KV Cache? Why Not Both? [6.982874528357836]
Cake is a novel KV cache loading system that optimally utilizes both computational and I/O resources in parallel. Cake achieves on average 2.6x reduction in Time to First Token (TTFT) compared to compute-only and I/O-only methods.
arXiv Detail & Related papers (2024-10-04T01:11:09Z)
RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems [22.87458184871264]
This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. We propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages.
arXiv Detail & Related papers (2024-09-20T03:02:42Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
SCALM: Towards Semantic Caching for Automated Chat Services with Large Language Models [15.742472622602557]
We propose SCALM, a new cache architecture that emphasizes semantic analysis and identifies significant cache entries and patterns. Our evaluations show that SCALM increases cache hit ratios and reduces operational costs for LLMChat services.
arXiv Detail & Related papers (2024-05-24T08:16:22Z)
Recommenadation aided Caching using Combinatorial Multi-armed Bandits [0.06554326244334867]
We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We propose a UCB-based algorithm to decide which contents to cache and recommend and provide an upper bound on the regret of this algorithm.
arXiv Detail & Related papers (2024-04-30T16:35:08Z)
CORM: Cache Optimization with Recent Message for Large Language Model Inference [57.109354287786154]
We introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint. CORM, a KV cache eviction policy, dynamically retains essential key-value pairs for inference without the need for model fine-tuning. Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70% with negligible performance degradation across six tasks in LongBench.
arXiv Detail & Related papers (2024-04-24T16:11:54Z)
Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge. Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited. We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)
ARCH: Efficient Adversarial Regularized Training with Caching [91.74682538906691]
Adversarial regularization can improve model generalization in many natural language processing tasks. We propose a new adversarial regularization method ARCH, where perturbations are generated and cached once every several epochs. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks.
arXiv Detail & Related papers (2021-09-15T02:05:37Z)
Cache Replacement as a MAB with Delayed Feedback and Decaying Costs [4.358626952482686]
We propose and solve a new variant of the well-known multi-armed bandit (MAB) Each arm represents a distinct cache replacement policy, which advises on the page to evict from the cache when needed. We introduce an adaptive reinforcement learning algorithm EXP4-DFDC that provides a solution to the problem.
arXiv Detail & Related papers (2020-09-23T18:26:48Z)
Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks. To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.