An Enhanced Batch Query Architecture in Real-time Recommendation
- URL: http://arxiv.org/abs/2409.00400v1
- Date: Sat, 31 Aug 2024 09:19:41 GMT
- Title: An Enhanced Batch Query Architecture in Real-time Recommendation
- Authors: Qiang Zhang, Zhipeng Teng, Disheng Wu, Jiayin Wang,
- Abstract summary: In industrial recommendation systems on websites and apps, it is essential to recall and predict top-n results relevant to user interests.
We have designed and implemented a high-performance batch query architecture for real-time recommendation systems.
This architecture has been deployed and in use in the bilibili recommendation system for over a year, supporting 10x increase in model with minimal resource growth.
- Score: 9.073405491915198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In industrial recommendation systems on websites and apps, it is essential to recall and predict top-n results relevant to user interests from a content pool of billions within milliseconds. To cope with continuous data growth and improve real-time recommendation performance, we have designed and implemented a high-performance batch query architecture for real-time recommendation systems. Our contributions include optimizing hash structures with a cacheline-aware probing method to enhance coalesced hashing, as well as the implementation of a hybrid storage key-value service built upon it. Our experiments indicate this approach significantly surpasses conventional hash tables in batch query throughput, achieving up to 90% of the query throughput of random memory access when incorporating parallel optimization. The support for NVMe, integrating two-tier storage for hot and cold data, notably reduces resource consumption. Additionally, the system facilitates dynamic updates, automated sharding of attributes and feature embedding tables, and introduces innovative protocols for consistency in batch queries, thereby enhancing the effectiveness of real-time incremental learning updates. This architecture has been deployed and in use in the bilibili recommendation system for over a year, a video content community with hundreds of millions of users, supporting 10x increase in model computation with minimal resource growth, improving outcomes while preserving the system's real-time performance.
Related papers
- Novel Architecture for Distributed Travel Data Integration and Service Provision Using Microservices [1.03590082373586]
This paper introduces an architecture for enhancing the flexibility and performance of an airline reservation system.
The design incorporates Redis cache technologies, two different messaging systems (Kafka and RabbitMQ), two types of architectural storages (MongoDB, and Docker)
The architecture provides an impressive level of data consistency at 99.5% and a latency of data propagation of less than 75 ms.
arXiv Detail & Related papers (2024-10-31T17:41:14Z) - Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - EASRec: Elastic Architecture Search for Efficient Long-term Sequential
Recommender Systems [82.76483989905961]
Current Sequential Recommender Systems (SRSs) suffer from computational and resource inefficiencies.
We develop the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec)
EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network.
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction [17.94189417448127]
We propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA) for real-time text prediction.
It efficiently combines a cloud-based large language model with a smaller client-side model through retrieval augmented memory.
Our experiments on five datasets demonstrate that Hybrid-RACA offers strong performance while maintaining low latency.
arXiv Detail & Related papers (2023-08-08T12:27:20Z) - HPC Storage Service Autotuning Using Variational-Autoencoder-Guided
Asynchronous Bayesian Optimization [3.153934519625761]
We develop a novel variational-autoencoder-guided asynchronous Bayesian optimization method to tune HPC storage service parameters.
We implement our approach within the DeepHyper open-source framework, and apply it to the autotuning of a high-energy physics workflow on Argonne's Theta supercomputer.
Our approach is on par with state-of-the-art autotuning frameworks in speed and outperforms them in resource utilization and parallelization capabilities.
arXiv Detail & Related papers (2022-10-03T10:12:57Z) - BagPipe: Accelerating Deep Recommendation Model Training [9.911467752221863]
Bagpipe is a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation.
We design an Oracle Cacher, a new component that uses a lookahead algorithm to generate optimal cache update decisions.
arXiv Detail & Related papers (2022-02-24T23:54:12Z) - Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME.
PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively.
In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z) - DHA: End-to-End Joint Optimization of Data Augmentation Policy,
Hyper-parameter and Architecture [81.82173855071312]
We propose an end-to-end solution that integrates the AutoML components and returns a ready-to-use model at the end of the search.
Dha achieves state-of-the-art (SOTA) results on various datasets, especially 77.4% accuracy on ImageNet with cell based search space.
arXiv Detail & Related papers (2021-09-13T08:12:50Z) - Fast Class-wise Updating for Online Hashing [196.14748396106955]
This paper presents a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH)
A class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently.
arXiv Detail & Related papers (2020-12-01T07:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.