How to Index Item IDs for Recommendation Foundation Models
- URL: http://arxiv.org/abs/2305.06569v6
- Date: Tue, 26 Sep 2023 01:40:11 GMT
- Title: How to Index Item IDs for Recommendation Foundation Models
- Authors: Wenyue Hua, Shuyuan Xu, Yingqiang Ge, Yongfeng Zhang
- Abstract summary: Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks.
To avoid generating excessively long text and hallucinated recommendations, creating LLM-compatible item IDs is essential.
We propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing.
- Score: 49.425959632372425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommendation foundation model utilizes large language models (LLM) for
recommendation by converting recommendation tasks into natural language tasks.
It enables generative recommendation which directly generates the item(s) to
recommend rather than calculating a ranking score for each and every candidate
item as in traditional recommendation models, simplifying the recommendation
pipeline from multi-stage filtering to single-stage filtering. To avoid
generating excessively long text and hallucinated recommendations when deciding
which item(s) to recommend, creating LLM-compatible item IDs to uniquely
identify each item is essential for recommendation foundation models. In this
study, we systematically examine the item ID creation and indexing problem for
recommendation foundation models, using P5 as an example of the backbone LLM.
To emphasize the importance of item indexing, we first discuss the issues of
several trivial item indexing methods, such as random indexing, title indexing,
and independent indexing. We then propose four simple yet effective solutions,
including sequential indexing, collaborative indexing, semantic (content-based)
indexing, and hybrid indexing. Our study highlights the significant influence
of item indexing methods on the performance of LLM-based recommendation, and
our results on real-world datasets validate the effectiveness of our proposed
solutions. The research also demonstrates how recent advances on language
modeling and traditional IR principles such as indexing can help each other for
better learning and inference. Source code and data are available at
https://github.com/Wenyueh/LLM-RecSys-ID.
Related papers
- Unleashing the Power of Large Language Models for Group POI Recommendations [39.49785677738477]
Group Point-of-Interest (POI) recommendations aim to predict the next POI that satisfies the diverse preferences of a group of users.
Existing methods for group POI recommendations rely on single ID-based features from check-in data.
We propose a framework that unleashes power of the Large Language Model (LLM) for context-aware group POI recommendations.
arXiv Detail & Related papers (2024-11-20T16:02:14Z) - Generative Retrieval with Few-shot Indexing [32.19543023080197]
Training-based indexing has three limitations: high training overhead, under-utilization of the pre-trained knowledge of large language models, and challenges in adapting to a dynamic document corpus.
Few-Shot GR relies solely on prompting an LLM without requiring any training, making it more efficient.
Experiments show that Few-Shot GR achieves superior performance to state-of-the-art GR methods that require heavy training.
arXiv Detail & Related papers (2024-08-04T22:00:34Z) - Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation [50.19602159938368]
Large language models (LLMs) are revolutionizing conversational recommender systems.
We propose a Reindex-Then-Adapt (RTA) framework, which converts multi-token item titles into single tokens within LLMs.
Our framework demonstrates improved accuracy metrics across three different conversational recommendation datasets.
arXiv Detail & Related papers (2024-05-20T15:37:55Z) - MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation.
We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information.
We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z) - IDGenRec: LLM-RecSys Alignment with Textual ID Learning [48.018397048791115]
We propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID.
We show that IDGen consistently surpasses existing models in sequential recommendation under standard experimental setting.
Results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models.
arXiv Detail & Related papers (2024-03-27T21:22:37Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - LlamaRec: Two-Stage Recommendation using Large Language Models for
Ranking [10.671747198171136]
We propose a two-stage framework using large language models for ranking-based recommendation (LlamaRec)
In particular, we use small-scale sequential recommenders to retrieve candidates based on the user interaction history.
LlamaRec consistently achieves datasets superior performance in both recommendation performance and efficiency.
arXiv Detail & Related papers (2023-10-25T06:23:48Z) - Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates.
To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item.
We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.