How to Index Item IDs for Recommendation Foundation Models
- URL: http://arxiv.org/abs/2305.06569v6
- Date: Tue, 26 Sep 2023 01:40:11 GMT
- Title: How to Index Item IDs for Recommendation Foundation Models
- Authors: Wenyue Hua, Shuyuan Xu, Yingqiang Ge, Yongfeng Zhang
- Abstract summary: Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks.
To avoid generating excessively long text and hallucinated recommendations, creating LLM-compatible item IDs is essential.
We propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing.
- Score: 49.425959632372425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommendation foundation model utilizes large language models (LLM) for
recommendation by converting recommendation tasks into natural language tasks.
It enables generative recommendation which directly generates the item(s) to
recommend rather than calculating a ranking score for each and every candidate
item as in traditional recommendation models, simplifying the recommendation
pipeline from multi-stage filtering to single-stage filtering. To avoid
generating excessively long text and hallucinated recommendations when deciding
which item(s) to recommend, creating LLM-compatible item IDs to uniquely
identify each item is essential for recommendation foundation models. In this
study, we systematically examine the item ID creation and indexing problem for
recommendation foundation models, using P5 as an example of the backbone LLM.
To emphasize the importance of item indexing, we first discuss the issues of
several trivial item indexing methods, such as random indexing, title indexing,
and independent indexing. We then propose four simple yet effective solutions,
including sequential indexing, collaborative indexing, semantic (content-based)
indexing, and hybrid indexing. Our study highlights the significant influence
of item indexing methods on the performance of LLM-based recommendation, and
our results on real-world datasets validate the effectiveness of our proposed
solutions. The research also demonstrates how recent advances on language
modeling and traditional IR principles such as indexing can help each other for
better learning and inference. Source code and data are available at
https://github.com/Wenyueh/LLM-RecSys-ID.
Related papers
- Generative Retrieval with Few-shot Indexing [32.19543023080197]
Training-based indexing has three limitations: high training overhead, under-utilization of the pre-trained knowledge of large language models, and challenges in adapting to a dynamic document corpus.
Few-Shot GR relies solely on prompting an LLM without requiring any training, making it more efficient.
Experiments show that Few-Shot GR achieves superior performance to state-of-the-art GR methods that require heavy training.
arXiv Detail & Related papers (2024-08-04T22:00:34Z) - Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation [50.19602159938368]
Large language models (LLMs) are revolutionizing conversational recommender systems.
We propose a Reindex-Then-Adapt (RTA) framework, which converts multi-token item titles into single tokens within LLMs.
Our framework demonstrates improved accuracy metrics across three different conversational recommendation datasets.
arXiv Detail & Related papers (2024-05-20T15:37:55Z) - IDGenRec: LLM-RecSys Alignment with Textual ID Learning [48.018397048791115]
We propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID.
We show that IDGen consistently surpasses existing models in sequential recommendation under standard experimental setting.
Results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models.
arXiv Detail & Related papers (2024-03-27T21:22:37Z) - Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - LlamaRec: Two-Stage Recommendation using Large Language Models for
Ranking [10.671747198171136]
We propose a two-stage framework using large language models for ranking-based recommendation (LlamaRec)
In particular, we use small-scale sequential recommenders to retrieve candidates based on the user interaction history.
LlamaRec consistently achieves datasets superior performance in both recommendation performance and efficiency.
arXiv Detail & Related papers (2023-10-25T06:23:48Z) - Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates.
To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item.
We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.