Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
- URL: http://arxiv.org/abs/2509.25522v2
- Date: Fri, 03 Oct 2025 01:21:43 GMT
- Title: Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
- Authors: Jingzhe Liu, Liam Collins, Jiliang Tang, Tong Zhao, Neil Shah, Clark Mingxuan Ju,
- Abstract summary: Generative Recommendation (GR) tries to unify rich item semantics and collaborative filtering signals.<n>One popular modern approach is to use semantic IDs (SIDs) to represent items in an autoregressive user interaction sequence modeling setup.<n>We show that SID-based GR shows significant bottlenecks while scaling up the model.<n>We revisit another GR paradigm that directly uses large language models (LLMs) as recommenders.
- Score: 57.471604518714535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in generative models have allowed the emergence of a promising paradigm for recommender systems (RS), known as Generative Recommendation (GR), which tries to unify rich item semantics and collaborative filtering signals. One popular modern approach is to use semantic IDs (SIDs), which are discrete codes quantized from the embeddings of modality encoders (e.g., large language or vision models), to represent items in an autoregressive user interaction sequence modeling setup (henceforth, SID-based GR). While generative models in other domains exhibit well-established scaling laws, our work reveals that SID-based GR shows significant bottlenecks while scaling up the model. In particular, the performance of SID-based GR quickly saturates as we enlarge each component: the modality encoder, the quantization tokenizer, and the RS itself. In this work, we identify the limited capacity of SIDs to encode item semantic information as one of the fundamental bottlenecks. Motivated by this observation, as an initial effort to obtain GR models with better scaling behaviors, we revisit another GR paradigm that directly uses large language models (LLMs) as recommenders (henceforth, LLM-as-RS). Our experiments show that the LLM-as-RS paradigm has superior model scaling properties and achieves up to 20 percent improvement over the best achievable performance of SID-based GR through scaling. We also challenge the prevailing belief that LLMs struggle to capture collaborative filtering information, showing that their ability to model user-item interactions improves as LLMs scale up. Our analyses on both SID-based GR and LLMs across model sizes from 44M to 14B parameters underscore the intrinsic scaling limits of SID-based GR and position LLM-as-RS as a promising path toward foundation models for GR.
Related papers
- QARM V2: Quantitative Alignment Multi-Modal Recommendation for Reasoning User Sequence Modeling [43.14172197611297]
Traditional RecSys relies on ID-based embeddings for user sequence modeling in the General Search Unit (GSU) and Exact Search Unit (ESU) paradigm.<n>We present QARM V2, a unified framework that bridges LLM semantic understanding with RecSys business requirements for user sequence modeling.
arXiv Detail & Related papers (2026-02-09T11:57:28Z) - GLASS: A Generative Recommender for Long-sequence Modeling via SID-Tier and Semantic Search [51.44490997013772]
GLASS is a novel framework that integrates long-term user interests into the generative process via SID-Tier and Semantic Search.<n>We show that GLASS outperforms state-of-the-art baselines in experiments on two large-scale real-world datasets.
arXiv Detail & Related papers (2026-02-05T13:48:33Z) - MMGRid: Navigating Temporal-aware and Cross-domain Generative Recommendation via Model Merging [22.681048070167765]
Generative Recommendation (GR) has emerged as a new paradigm in recommender systems (RSs)<n>We focus on a fundamental yet underexplored challenge in real-world: how to merge generative recommenders specialized to different real-world contexts.<n>We propose a unified framework MMGRid, a structured contextual grid of GR checkpoints that organizes models trained under diverse contexts.
arXiv Detail & Related papers (2026-01-22T13:09:16Z) - GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning [52.16150076582931]
We propose Group Relative Policy Optimization for Representation Model (GRPO-RM)<n>Our method establishes a predefined output set to functionally replace token sequence sampling in large language models (LLMs)<n>A specialized reward function is designed to accommodate the properties of representation models.
arXiv Detail & Related papers (2025-11-19T09:19:39Z) - Generative Recommendation with Semantic IDs: A Practitioner's Handbook [34.25784373770595]
Generative Recommendation (GR) has gained increasing attention for its promising performance compared to traditional models.<n>A key factor contributing to the success of GR is the semantic ID (SID), which converts continuous semantic representations into discrete ID sequences.<n>Our work introduces and open-sources a framework for Generative Recommendation with semantic ID, namely GRID, specifically designed for modularity.
arXiv Detail & Related papers (2025-07-29T20:41:51Z) - LlamaRec-LKG-RAG: A Single-Pass, Learnable Knowledge Graph-RAG Framework for LLM-Based Ranking [0.0]
We introduce LlamaRec-LKG-RAG, a novel single-pass, end-to-end trainable framework that integrates personalized knowledge graph context into recommendation ranking.<n>Our approach extends the LlamaRec architecture by incorporating a lightweight user preference module that dynamically identifies salient relation paths.<n>Experiments on ML-100K and Amazon Beauty datasets demonstrate consistent and significant improvements over LlamaRec across key ranking metrics.
arXiv Detail & Related papers (2025-06-09T05:52:03Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation [66.72195610471624]
Cross-Domain Sequential Recommendation aims to mine and transfer users' sequential preferences across different domains.
We propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach.
arXiv Detail & Related papers (2024-06-05T09:19:54Z) - GSVA: Generalized Segmentation via Multimodal Large Language Models [72.57095903188922]
Generalized Referring Expression (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.
Current solutions to GRES remain unsatisfactory since segmentation MLLMs cannot correctly handle the cases where users might reference multiple subjects in a singular prompt.
We propose Generalized Vision Assistant (GSVA) to address this gap.
arXiv Detail & Related papers (2023-12-15T02:54:31Z) - Compositional Chain-of-Thought Prompting for Large Multimodal Models [46.721769077885966]
Compositional Chain-of-Thought (CCoT) is a novel zero-shot Chain-of-Thought prompting method.
We first generate an SG using the Large Language Model (LLM) and then use that SG in the prompt to produce a response.
We find that the proposed CCoT approach not only improves LMM performance but also improves the performance of several popular LMMs on general multimodal benchmarks.
arXiv Detail & Related papers (2023-11-27T22:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.