Related papers: Efficient Recommendation with Millions of Items by Dynamic Pruning of Sub-Item Embeddings

Efficient Recommendation with Millions of Items by Dynamic Pruning of Sub-Item Embeddings

URL: http://arxiv.org/abs/2505.00560v1
Date: Thu, 01 May 2025 14:36:33 GMT
Title: Efficient Recommendation with Millions of Items by Dynamic Pruning of Sub-Item Embeddings
Authors: Aleksandr V. Petrov, Craig Macdonald, Nicola Tonellotto,
Abstract summary: We propose a dynamic pruning algorithm to efficiently find the top highest-scored items in a large item catalogue.<n>Our RecJPQPrune algorithm is safe-up-to-rank K since it theoretically guarantees that no potentially high-scored item is excluded from the final top K recommendation list.<n>Our experiments on two large datasets and three recommendation models demonstrate the efficiency achievable using RecJPQPrune.
Score: 63.117573355917465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A large item catalogue is a major challenge for deploying modern sequential recommender models, since it makes the memory footprint of the model large and increases inference latency. One promising approach to address this is RecJPQ, which replaces item embeddings with sub-item embeddings. However, slow inference remains problematic because finding the top highest-scored items usually requires scoring all items in the catalogue, which may not be feasible for large catalogues. By adapting dynamic pruning concepts from document retrieval, we propose the RecJPQPrune dynamic pruning algorithm to efficiently find the top highest-scored items without computing the scores of all items in the catalogue. Our RecJPQPrune algorithm is safe-up-to-rank K since it theoretically guarantees that no potentially high-scored item is excluded from the final top K recommendation list, thereby ensuring no impact on effectiveness. Our experiments on two large datasets and three recommendation models demonstrate the efficiency achievable using RecJPQPrune: for instance, on the Tmall dataset with 2.2M items, we can reduce the median model scoring time by 64 times compared to the Transformer Default baseline, and 5.3 times compared to a recent scoring approach called PQTopK. Overall, this paper demonstrates the effective and efficient inference of Transformer-based recommendation models at catalogue scales not previously reported in the literature. Indeed, our RecJPQPrune algorithm can score 2 million items in under 10 milliseconds without GPUs, and without relying on Approximate Nearest Neighbour (ANN) techniques.

Related papers

Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs [4.165917157093442]
This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup.<n>It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality.<n> Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives.
arXiv Detail & Related papers (2024-09-27T13:17:59Z)
Efficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items [63.117573355917465]
We show that it is possible to improve RecJPQ-based models' inference efficiency using the PQTopK algorithm. We speed up RecJPQ-enhanced SASRec by a factor of 4.5 x compared to the original SASRec's inference method and by a factor of 1.56 x compared to the method implemented in RecJPQ code.
arXiv Detail & Related papers (2024-08-19T13:43:48Z)
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning [16.287067991245962]
In real-world systems, an important consideration for a new model is novelty of its top-k recommendations. We propose a reinforcement learning (RL) formulation where large language models provide feedback for the novel items. We evaluate the proposed algorithm on improving novelty for a query-ad recommendation task on a large-scale search engine.
arXiv Detail & Related papers (2024-06-20T10:20:02Z)
Top-Down Partitioning for Efficient List-Wise Ranking [24.600506147325717]
We propose a novel algorithm that partitions a ranking to depth k and processes documents top-down. Our algorithm is inherently parallelizable due to the use of a pivot element, which can be compared to documents down to an arbitrary depth concurrently.
arXiv Detail & Related papers (2024-05-23T14:00:26Z)
Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders [77.84801537608651]
Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. We propose a sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity.
arXiv Detail & Related papers (2024-05-06T17:14:34Z)
Cascading Reinforcement Learning [21.267106043818792]
In cascading bandit model, at each timestep, an agent recommends an ordered subset of items from a pool of items, each associated with an unknown attraction probability. We propose a generalized cascading RL framework, which considers the impact of user states and state transition into decisions. In cascading RL, we need to select items not only with large attraction probabilities but also leading to good successor states. We develop two algorithms CascadingVI and CascadingBPI, which are both computationally-efficient and sample-efficient.
arXiv Detail & Related papers (2024-01-17T04:20:26Z)
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever. Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z)
Learning over No-Preferred and Preferred Sequence of Items for Robust Recommendation (Extended Abstract) [69.50145858681951]
We propose a theoretically supported sequential strategy for training a large-scale Recommender System (RS) over implicit feedback. We present two variants of this strategy where model parameters are updated using either the momentum method or a gradient-based approach.
arXiv Detail & Related papers (2022-02-26T22:29:43Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.