Related papers: How to Index Item IDs for Recommendation Foundation Models

How to Index Item IDs for Recommendation Foundation Models

URL: http://arxiv.org/abs/2305.06569v6
Date: Tue, 26 Sep 2023 01:40:11 GMT
Title: How to Index Item IDs for Recommendation Foundation Models
Authors: Wenyue Hua, Shuyuan Xu, Yingqiang Ge, Yongfeng Zhang
Abstract summary: Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks. To avoid generating excessively long text and hallucinated recommendations, creating LLM-compatible item IDs is essential. We propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing.
Score: 49.425959632372425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks. It enables generative recommendation which directly generates the item(s) to recommend rather than calculating a ranking score for each and every candidate item as in traditional recommendation models, simplifying the recommendation pipeline from multi-stage filtering to single-stage filtering. To avoid generating excessively long text and hallucinated recommendations when deciding which item(s) to recommend, creating LLM-compatible item IDs to uniquely identify each item is essential for recommendation foundation models. In this study, we systematically examine the item ID creation and indexing problem for recommendation foundation models, using P5 as an example of the backbone LLM. To emphasize the importance of item indexing, we first discuss the issues of several trivial item indexing methods, such as random indexing, title indexing, and independent indexing. We then propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing. Our study highlights the significant influence of item indexing methods on the performance of LLM-based recommendation, and our results on real-world datasets validate the effectiveness of our proposed solutions. The research also demonstrates how recent advances on language modeling and traditional IR principles such as indexing can help each other for better learning and inference. Source code and data are available at https://github.com/Wenyueh/LLM-RecSys-ID.

Related papers

LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model [24.579793425796193]
We propose a resource-efficient index advisor that uses large language models (LLMs) without extensive fine-tuning. LLMs frames index recommendation as a sequence-to-sequence task, taking target workload, storage constraint, and corresponding database environment as input. Experiments on 3 OLAP and 2 real-world benchmarks reveal that LLMIdxAdvis delivers competitive index recommendation with reduced runtime.
arXiv Detail & Related papers (2025-03-10T22:01:24Z)
Order-agnostic Identifier for Large Language Model-based Generative Recommendation [94.37662915542603]
Items are assigned identifiers for Large Language Models (LLMs) to encode user history and generate the next item. Existing approaches leverage either token-sequence identifiers, representing items as discrete token sequences, or single-token identifiers, using ID or semantic embeddings. We propose SETRec, which leverages semantic tokenizers to obtain order-agnostic multi-dimensional tokens.
arXiv Detail & Related papers (2025-02-15T15:25:38Z)
ULMRec: User-centric Large Language Model for Sequential Recommendation [16.494996929730927]
We propose ULMRec, a framework that integrates user personalized preferences into Large Language Models. Extensive experiments on two public datasets demonstrate that ULMRec significantly outperforms existing methods.
arXiv Detail & Related papers (2024-12-07T05:37:00Z)
Unleashing the Power of Large Language Models for Group POI Recommendations [39.49785677738477]
Group Point-of-Interest (POI) recommendations aim to predict the next POI that satisfies the diverse preferences of a group of users. Existing methods for group POI recommendations rely on single ID-based features from check-in data. We propose a framework that unleashes power of the Large Language Model (LLM) for context-aware group POI recommendations.
arXiv Detail & Related papers (2024-11-20T16:02:14Z)
Generative Retrieval with Few-shot Indexing [32.19543023080197]
Training-based indexing has three limitations: high training overhead, under-utilization of the pre-trained knowledge of large language models, and challenges in adapting to a dynamic document corpus. Few-Shot GR relies solely on prompting an LLM without requiring any training, making it more efficient. Experiments show that Few-Shot GR achieves superior performance to state-of-the-art GR methods that require heavy training.
arXiv Detail & Related papers (2024-08-04T22:00:34Z)
Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation [50.19602159938368]
Large language models (LLMs) are revolutionizing conversational recommender systems. We propose a Reindex-Then-Adapt (RTA) framework, which converts multi-token item titles into single tokens within LLMs. Our framework demonstrates improved accuracy metrics across three different conversational recommendation datasets.
arXiv Detail & Related papers (2024-05-20T15:37:55Z)
MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation. We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information. We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z)
IDGenRec: LLM-RecSys Alignment with Textual ID Learning [48.018397048791115]
We propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID. We show that IDGen consistently surpasses existing models in sequential recommendation under standard experimental setting. Results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models.
arXiv Detail & Related papers (2024-03-27T21:22:37Z)
LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance. There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results. We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z)
LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking [10.671747198171136]
We propose a two-stage framework using large language models for ranking-based recommendation (LlamaRec) In particular, we use small-scale sequential recommenders to retrieve candidates based on the user interaction history. LlamaRec consistently achieves datasets superior performance in both recommendation performance and efficiency.
arXiv Detail & Related papers (2023-10-25T06:23:48Z)
Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates. To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item. We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.