Universal Item Tokenization for Transferable Generative Recommendation
- URL: http://arxiv.org/abs/2504.04405v2
- Date: Mon, 14 Apr 2025 02:50:20 GMT
- Title: Universal Item Tokenization for Transferable Generative Recommendation
- Authors: Bowen Zheng, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ji-Rong Wen,
- Abstract summary: We propose UTGRec, a universal item tokenization approach for transferable Generative Recommendation.<n>By devising tree-structured codebooks, we discretize content representations into corresponding codes for item tokenization.<n>For raw content reconstruction, we employ dual lightweight decoders to reconstruct item text and images from discrete representations.<n>For collaborative knowledge integration, we assume that co-occurring items are similar and integrate collaborative signals through co-occurrence alignment and reconstruction.
- Score: 89.42584009980676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, generative recommendation has emerged as a promising paradigm, attracting significant research attention. The basic framework involves an item tokenizer, which represents each item as a sequence of codes serving as its identifier, and a generative recommender that predicts the next item by autoregressively generating the target item identifier. However, in existing methods, both the tokenizer and the recommender are typically domain-specific, limiting their ability for effective transfer or adaptation to new domains. To this end, we propose UTGRec, a Universal item Tokenization approach for transferable Generative Recommendation. Specifically, we design a universal item tokenizer for encoding rich item semantics by adapting a multimodal large language model (MLLM). By devising tree-structured codebooks, we discretize content representations into corresponding codes for item tokenization. To effectively learn the universal item tokenizer on multiple domains, we introduce two key techniques in our approach. For raw content reconstruction, we employ dual lightweight decoders to reconstruct item text and images from discrete representations to capture general knowledge embedded in the content. For collaborative knowledge integration, we assume that co-occurring items are similar and integrate collaborative signals through co-occurrence alignment and reconstruction. Finally, we present a joint learning framework to pre-train and adapt the transferable generative recommender across multiple domains. Extensive experiments on four public datasets demonstrate the superiority of UTGRec compared to both traditional and generative recommendation baselines.
Related papers
- Pre-training Generative Recommender with Multi-Identifier Item Tokenization [78.87007819266957]
We propose MTGRec to augment token sequence data for Generative Recommender pre-training.<n>Our approach involves two key innovations: multi-identifier item tokenization and curriculum recommender pre-training.<n>Extensive experiments on three public benchmark datasets demonstrate that MTGRec significantly outperforms both traditional and generative recommendation baselines.
arXiv Detail & Related papers (2025-04-06T08:03:03Z) - Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommendation [91.13055384151897]
CoCoRec is a novel Code-based textual and Collaborative semantic fusion method for sequential Recommendation.<n>We generate fine-grained semantic codes from multi-view text embeddings through vector quantization techniques.<n>In order to further enhance the fusion of textual and collaborative semantics, we introduce an optimization strategy.
arXiv Detail & Related papers (2025-03-15T15:54:44Z) - STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM [59.08493154172207]
We propose a unified framework to streamline the semantic tokenization and generative recommendation process.
We formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task.
All these tasks are framed in a generative manner and trained using a single large language model (LLM) backbone.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration [63.112790050749695]
We introduce EAGER, a novel generative recommendation framework that seamlessly integrates both behavioral and semantic information.
We validate the effectiveness of EAGER on four public benchmarks, demonstrating its superior performance compared to existing methods.
arXiv Detail & Related papers (2024-06-20T06:21:56Z) - MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation.
We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information.
We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z) - Learning Vector-Quantized Item Representation for Transferable
Sequential Recommenders [33.406897794088515]
VQ-Rec is a novel approach to learning Vector-Quantized item representations for transferable sequential Recommender.
We propose an enhanced contrastive pre-training approach, using semi-synthetic and mixed-domain code representations as hard negatives.
arXiv Detail & Related papers (2022-10-22T00:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.