An Enhanced Batch Query Architecture in Real-time Recommendation
- URL: http://arxiv.org/abs/2409.00400v1
- Date: Sat, 31 Aug 2024 09:19:41 GMT
- Title: An Enhanced Batch Query Architecture in Real-time Recommendation
- Authors: Qiang Zhang, Zhipeng Teng, Disheng Wu, Jiayin Wang,
- Abstract summary: In industrial recommendation systems on websites and apps, it is essential to recall and predict top-n results relevant to user interests.
We have designed and implemented a high-performance batch query architecture for real-time recommendation systems.
This architecture has been deployed and in use in the bilibili recommendation system for over a year, supporting 10x increase in model with minimal resource growth.
- Score: 9.073405491915198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In industrial recommendation systems on websites and apps, it is essential to recall and predict top-n results relevant to user interests from a content pool of billions within milliseconds. To cope with continuous data growth and improve real-time recommendation performance, we have designed and implemented a high-performance batch query architecture for real-time recommendation systems. Our contributions include optimizing hash structures with a cacheline-aware probing method to enhance coalesced hashing, as well as the implementation of a hybrid storage key-value service built upon it. Our experiments indicate this approach significantly surpasses conventional hash tables in batch query throughput, achieving up to 90% of the query throughput of random memory access when incorporating parallel optimization. The support for NVMe, integrating two-tier storage for hot and cold data, notably reduces resource consumption. Additionally, the system facilitates dynamic updates, automated sharding of attributes and feature embedding tables, and introduces innovative protocols for consistency in batch queries, thereby enhancing the effectiveness of real-time incremental learning updates. This architecture has been deployed and in use in the bilibili recommendation system for over a year, a video content community with hundreds of millions of users, supporting 10x increase in model computation with minimal resource growth, improving outcomes while preserving the system's real-time performance.
Related papers
- The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit [46.37267466656765]
This paper presents an optimization framework that combines Retrieval-Augmented Generation (RAG) with an innovative multi-head early exit architecture.
Our experiments demonstrate how this architecture effectively decreases time without sacrificing the accuracy needed for reliable recommendation delivery.
arXiv Detail & Related papers (2025-01-04T03:26:46Z) - Dynamic Optimization of Storage Systems Using Reinforcement Learning Techniques [40.13303683102544]
This paper introduces RL-Storage, a reinforcement learning-based framework designed to dynamically optimize storage system configurations.
RL-Storage learns from real-time I/O patterns and predicts optimal storage parameters, such as cache size, queue depths, and readahead settings.
It achieves throughput gains of up to 2.6x and latency reductions of 43% compared to baselines.
arXiv Detail & Related papers (2024-12-29T17:41:40Z) - Novel Architecture for Distributed Travel Data Integration and Service Provision Using Microservices [1.03590082373586]
This paper introduces an architecture for enhancing the flexibility and performance of an airline reservation system.
The design incorporates Redis cache technologies, two different messaging systems (Kafka and RabbitMQ), two types of architectural storages (MongoDB, and Docker)
The architecture provides an impressive level of data consistency at 99.5% and a latency of data propagation of less than 75 ms.
arXiv Detail & Related papers (2024-10-31T17:41:14Z) - Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems [79.76519917171261]
This paper addresses the computational overhead and resource inefficiency prevalent in Sequential Recommender Systems (SRSs)
We introduce an innovative approach combining pruning methods with advanced model designs.
Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec)
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction [17.94189417448127]
We propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA) for real-time text prediction.
It efficiently combines a cloud-based large language model with a smaller client-side model through retrieval augmented memory.
Our experiments on five datasets demonstrate that Hybrid-RACA offers strong performance while maintaining low latency.
arXiv Detail & Related papers (2023-08-08T12:27:20Z) - Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME.
PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively.
In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z) - DHA: End-to-End Joint Optimization of Data Augmentation Policy,
Hyper-parameter and Architecture [81.82173855071312]
We propose an end-to-end solution that integrates the AutoML components and returns a ready-to-use model at the end of the search.
Dha achieves state-of-the-art (SOTA) results on various datasets, especially 77.4% accuracy on ImageNet with cell based search space.
arXiv Detail & Related papers (2021-09-13T08:12:50Z) - Fast Class-wise Updating for Online Hashing [196.14748396106955]
This paper presents a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH)
A class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently.
arXiv Detail & Related papers (2020-12-01T07:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.