RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems
- URL: http://arxiv.org/abs/2409.13175v1
- Date: Fri, 20 Sep 2024 03:02:42 GMT
- Title: RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems
- Authors: Shuo Su, Xiaoshuang Chen, Yao Wang, Yulin Wu, Ziqiang Zhang, Kaiqiao Zhan, Ben Wang, Kun Gai,
- Abstract summary: This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation.
We propose a reinforcement prediction-allocation framework (RPAF) to address these issues.
RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages.
- Score: 22.87458184871264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems [10.52021139266752]
cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache.
CARL can significantly improve the users' engagement when considering the result cache.
CARL has been fully launched in Kwai app, serving over 100 million users.
arXiv Detail & Related papers (2024-04-23T12:06:40Z) - Client Orchestration and Cost-Efficient Joint Optimization for
NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z) - Joint Service Caching, Communication and Computing Resource Allocation in Collaborative MEC Systems: A DRL-based Two-timescale Approach [15.16859210403316]
Meeting the strict Quality of Service (QoS) requirements of terminals has imposed a challenge on Multiaccess Edge Computing (MEC) systems.
We propose a collaborative framework that facilitates resource sharing between the edge servers.
We show that our proposed algorithm outperforms the baseline algorithms in terms of the average switching and cache cost.
arXiv Detail & Related papers (2023-07-19T00:27:49Z) - A Bandit Approach to Online Pricing for Heterogeneous Edge Resource
Allocation [8.089950414444115]
Two novel online pricing mechanisms are proposed for heterogeneous edge resource allocation.
The mechanisms operate in real-time and do not require prior knowledge of demand distribution.
The proposed posted pricing schemes allow users to select and pay for their preferred resources, with the platform dynamically adjusting resource prices based on observed historical data.
arXiv Detail & Related papers (2023-02-14T10:21:14Z) - Online Caching with no Regret: Optimistic Learning via Recommendations [15.877673959068458]
We build upon the Follow-the-Regularized-Leader (FTRL) framework to include predictions for the file requests.
We extend the framework to learn and utilize the best request predictor in cases where many are available.
We prove that the proposed optimistic learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions.
arXiv Detail & Related papers (2022-04-20T09:29:47Z) - Accelerating Deep Learning Classification with Error-controlled
Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching.
While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error.
We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z) - Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge
Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks.
We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.