End-to-End Semantic ID Generation for Generative Advertisement Recommendation
- URL: http://arxiv.org/abs/2602.10445v2
- Date: Thu, 12 Feb 2026 09:56:10 GMT
- Title: End-to-End Semantic ID Generation for Generative Advertisement Recommendation
- Authors: Jie Jiang, Xinxun Zhang, Enming Zhang, Yuling Xiong, Jun Zhang, Jingwen Wang, Huan Yu, Yuxiang Wang, Hao Wang, Xiao Yan, Jiawei Jiang,
- Abstract summary: We propose a Unified SID generation framework for generative advertisement recommendation.<n>Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data.<n>Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods.
- Score: 33.453121305193434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Recommendation (GR) has excelled by framing recommendation as next-token prediction. This paradigm relies on Semantic IDs (SIDs) to tokenize large-scale items into discrete sequences. Existing GR approaches predominantly generate SIDs via Residual Quantization (RQ), where items are encoded into embeddings and then quantized to discrete SIDs. However, this paradigm suffers from inherent limitations: 1) Objective misalignment and semantic degradation stemming from the two-stage compression; 2) Error accumulation inherent in the structure of RQ. To address these limitations, we propose UniSID, a Unified SID generation framework for generative advertisement recommendation. Specifically, we jointly optimize embeddings and SIDs in an end-to-end manner from raw advertising data, enabling semantic information to flow directly into the SID space and thus addressing the inherent limitations of the two-stage cascading compression paradigm. To capture fine-grained semantics, a multi-granularity contrastive learning strategy is introduced to align distinct items across SID levels. Finally, a summary-based ad reconstruction mechanism is proposed to encourage SIDs to capture high-level semantic information that is not explicitly present in advertising contexts. Experiments demonstrate that UniSID consistently outperforms state-of-the-art SID generation methods, yielding up to a 4.62% improvement in Hit Rate metrics across downstream advertising scenarios compared to the strongest baseline.
Related papers
- Stop Treating Collisions Equally: Qualification-Aware Semantic ID Learning for Recommendation at Industrial Scale [24.395492499196063]
QuaSID is an end-to-end framework that learns collision-qualified SIDs by selectively qualified conflict pairs and scaling the repulsion strength by collision severity.<n> Experiments on public benchmarks and industrial data validate QuaSID.
arXiv Detail & Related papers (2026-02-28T12:55:49Z) - Fine-grained Semantics Integration for Large Language Model-based Recommendation [35.75224379727093]
We propose TS-Rec, which can integrate Token-level Semantics into LLM-based Recommenders.<n>Extensive experiments on two real-world benchmarks demonstrate that TS-Rec consistently outperforms traditional and generative baselines.
arXiv Detail & Related papers (2026-02-26T05:17:24Z) - IntRR: A Framework for Integrating SID Redistribution and Length Reduction [14.327886721362647]
We propose IntRR, a novel framework that integrates objective-aligned SID Redistribution and structural Length Reduction.<n>IntRR yields substantial improvements over representative generative baselines, achieving superior performance in both recommendation accuracy and efficiency.
arXiv Detail & Related papers (2026-02-24T09:09:40Z) - GLASS: A Generative Recommender for Long-sequence Modeling via SID-Tier and Semantic Search [51.44490997013772]
GLASS is a novel framework that integrates long-term user interests into the generative process via SID-Tier and Semantic Search.<n>We show that GLASS outperforms state-of-the-art baselines in experiments on two large-scale real-world datasets.
arXiv Detail & Related papers (2026-02-05T13:48:33Z) - Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs [17.944727019161878]
ReSID is a principled, SID framework that recommend learning from the perspective of information preservation and sequential predictability.<n>It consistently outperforms strong sequential and SID-based generative baselines by an average of over 10%, while reducing tokenization cost by up to 122x.
arXiv Detail & Related papers (2026-02-02T17:00:04Z) - Differentiable Semantic ID for Generative Recommendation [65.83703273297492]
Generative recommendation provides a novel paradigm in which each item is represented by a discrete semantic ID (SID) learned from rich content.<n>In practice, SIDs are typically optimized only for content reconstruction rather than recommendation accuracy.<n>A natural approach is to make semantic indexing differentiable so that recommendation gradients can directly influence SID learning.<n>We propose DIGER, a first step toward effective differentiable semantic IDs for generative recommendation.
arXiv Detail & Related papers (2026-01-27T15:34:11Z) - S$^2$GR: Stepwise Semantic-Guided Reasoning in Latent Space for Generative Recommendation [15.69884243417431]
Generative Recommendation (GR) has emerged as a transformative paradigm with its end-to-end generation advantages.<n>Existing GR methods primarily focus on direct Semantic ID (SID) generation from interaction sequences.<n>We propose stepwise semantic-guided reasoning in latent space (S$2$GR), a novel reasoning enhanced GR framework.
arXiv Detail & Related papers (2026-01-26T16:40:37Z) - PRISM: Purified Representation and Integrated Semantic Modeling for Generative Sequential Recommendation [28.629759086187352]
We propose a novel generative recommendation framework, PRISM, with Purified Representation and Integrated Semantic Modeling.<n>PRISM consistently outperforms state-of-the-art baselines across four real-world datasets.
arXiv Detail & Related papers (2026-01-23T08:50:16Z) - DiffGRM: Diffusion-based Generative Recommendation Model [63.35379395455103]
Generative recommendation (GR) is an emerging paradigm that represents each item via a tokenizer as an n-digit semantic ID (SID)<n>We propose DiffGRM, a diffusion-based GR model that replaces the autoregressive decoder with a masked discrete diffusion model (MDM)<n> Experiments show consistent gains over strong generative and discriminative recommendation baselines on multiple datasets.
arXiv Detail & Related papers (2025-10-21T03:23:32Z) - Understanding Generative Recommendation with Semantic IDs from a Model-scaling View [57.471604518714535]
Generative Recommendation (GR) tries to unify rich item semantics and collaborative filtering signals.<n>One popular modern approach is to use semantic IDs (SIDs) to represent items in an autoregressive user interaction sequence modeling setup.<n>We show that SID-based GR shows significant bottlenecks while scaling up the model.<n>We revisit another GR paradigm that directly uses large language models (LLMs) as recommenders.
arXiv Detail & Related papers (2025-09-29T21:24:17Z) - FORGE: Forming Semantic Identifiers for Generative Retrieval in Industrial Datasets [64.51403245281547]
FORGE is a benchmark for FOrming semantic identifieR in Generative rEtrieval with industrial datasets.<n>For real-world applications, FORGE introduces an offline pretraining schema that reduces online convergence by half.
arXiv Detail & Related papers (2025-09-25T08:44:22Z) - SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs [70.79124435220695]
We propose a novel unified Semantic-enhanced generative Cross-mOdal REtrieval framework (SemCORE)<n>We first construct a Structured natural language IDentifier (SID) that effectively aligns target identifiers with generative models optimized for natural language comprehension and generation.<n>We then introduce a Generative Semantic Verification (GSV) strategy enabling fine-grained target discrimination.
arXiv Detail & Related papers (2025-04-17T17:59:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.