Generating Long Semantic IDs in Parallel for Recommendation
- URL: http://arxiv.org/abs/2506.05781v1
- Date: Fri, 06 Jun 2025 06:20:37 GMT
- Title: Generating Long Semantic IDs in Parallel for Recommendation
- Authors: Yupeng Hou, Jiacheng Li, Ashley Shin, Jinsung Jeon, Abhishek Santhanam, Wei Shao, Kaveh Hassani, Ning Yao, Julian McAuley,
- Abstract summary: We propose RPG, a lightweight framework for semantic ID-based recommendation.<n>We train the model to predict each token independently using a multi-token prediction loss.<n> Experiments show that scaling up semantic ID length to 64 enables RPG to outperform generative baselines.
- Score: 29.97624755406803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic ID-based recommendation models tokenize each item into a small number of discrete tokens that preserve specific semantics, leading to better performance, scalability, and memory efficiency. While recent models adopt a generative approach, they often suffer from inefficient inference due to the reliance on resource-intensive beam search and multiple forward passes through the neural sequence model. As a result, the length of semantic IDs is typically restricted (e.g. to just 4 tokens), limiting their expressiveness. To address these challenges, we propose RPG, a lightweight framework for semantic ID-based recommendation. The key idea is to produce unordered, long semantic IDs, allowing the model to predict all tokens in parallel. We train the model to predict each token independently using a multi-token prediction loss, directly integrating semantics into the learning objective. During inference, we construct a graph connecting similar semantic IDs and guide decoding to avoid generating invalid IDs. Experiments show that scaling up semantic ID length to 64 enables RPG to outperform generative baselines by an average of 12.6% on the NDCG@10, while also improving inference efficiency. Code is available at: https://github.com/facebookresearch/RPG_KDD2025.
Related papers
- Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations [83.93566096400723]
We find that instruction-tuned models retain up to 93.4% of their original performance when given a randomly sampled tokenization.<n>Character-level segmentation improves string manipulation and code understanding tasks by up to +14%.<n>Right-aligned digit grouping enhances large-number arithmetic by +33%.
arXiv Detail & Related papers (2025-06-23T18:02:26Z) - Traversal Verification for Speculative Tree Decoding [9.534492618180085]
Speculative decoding is a promising approach for accelerating large language models.<n>This paper introduces Traversal Verification, a novel speculative decoding algorithm.<n>We show that our method consistently improves acceptance length and throughput over existing methods.
arXiv Detail & Related papers (2025-05-18T12:51:55Z) - Order-agnostic Identifier for Large Language Model-based Generative Recommendation [94.37662915542603]
Items are assigned identifiers for Large Language Models (LLMs) to encode user history and generate the next item.<n>Existing approaches leverage either token-sequence identifiers, representing items as discrete token sequences, or single-token identifiers, using ID or semantic embeddings.<n>We propose SETRec, which leverages semantic tokenizers to obtain order-agnostic multi-dimensional tokens.
arXiv Detail & Related papers (2025-02-15T15:25:38Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Multi-Candidate Speculative Decoding [82.05519287513444]
Large language models have shown impressive capabilities across a variety of NLP tasks, yet their generating text autoregressively is time-consuming.
One way to speed them up is speculative decoding, which generates candidate segments from a fast draft model that is then verified in parallel by the target model.
This paper proposes sampling multiple candidates from a draft model and then organising them in batches for verification.
We design algorithms for efficient multi-candidate verification while maintaining the distribution of the target model.
arXiv Detail & Related papers (2024-01-12T17:15:23Z) - Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction.
The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z) - Language Models As Semantic Indexers [78.83425357657026]
We introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model.
We show the high quality of the learned IDs and demonstrate their effectiveness on three tasks including recommendation, product search, and document retrieval.
arXiv Detail & Related papers (2023-10-11T18:56:15Z) - Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations [24.952222114424146]
We propose using content-derived features as a replacement for random ids.
We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability.
Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models.
arXiv Detail & Related papers (2023-06-13T20:34:15Z) - Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates.
To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item.
We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.