Large Language Models are Zero-Shot Rankers for Recommender Systems
- URL: http://arxiv.org/abs/2305.08845v2
- Date: Wed, 24 Jan 2024 04:41:01 GMT
- Title: Large Language Models are Zero-Shot Rankers for Recommender Systems
- Authors: Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian
McAuley, Wayne Xin Zhao
- Abstract summary: This work aims to investigate the capacity of large language models (LLMs) to act as the ranking model for recommender systems.
We show that LLMs have promising zero-shot ranking abilities but struggle to perceive the order of historical interactions.
We demonstrate that these issues can be alleviated using specially designed prompting and bootstrapping strategies.
- Score: 76.02500186203929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, large language models (LLMs) (e.g., GPT-4) have demonstrated
impressive general-purpose task-solving abilities, including the potential to
approach recommendation tasks. Along this line of research, this work aims to
investigate the capacity of LLMs that act as the ranking model for recommender
systems. We first formalize the recommendation problem as a conditional ranking
task, considering sequential interaction histories as conditions and the items
retrieved by other candidate generation models as candidates. To solve the
ranking task by LLMs, we carefully design the prompting template and conduct
extensive experiments on two widely-used datasets. We show that LLMs have
promising zero-shot ranking abilities but (1) struggle to perceive the order of
historical interactions, and (2) can be biased by popularity or item positions
in the prompts. We demonstrate that these issues can be alleviated using
specially designed prompting and bootstrapping strategies. Equipped with these
insights, zero-shot LLMs can even challenge conventional recommendation models
when ranking candidates are retrieved by multiple candidate generators. The
code and processed datasets are available at
https://github.com/RUCAIBox/LLMRank.
Related papers
- Beyond Utility: Evaluating LLM as Recommender [47.97889161958022]
We explore four new evaluation dimensions and propose a multidimensional evaluation framework.
New evaluation dimensions include: history length sensitivity, candidate position bias, 3) generation-involved performance, and 4) hallucinations.
Using this multidimensional evaluation framework, along with traditional aspects, we evaluate the performance of seven LLM-based recommenders.
arXiv Detail & Related papers (2024-11-01T03:09:28Z) - Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations [5.374800961359305]
We introduce KALM4Rec, a framework to address the problem of cold-start user restaurant recommendations.
KALM4Rec operates in two main stages: candidates retrieval and LLM-based candidates re-ranking.
Our evaluation, using a Yelp restaurant dataset with user reviews from three English-speaking cities, shows that our proposed framework significantly improves recommendation quality.
arXiv Detail & Related papers (2024-05-30T02:00:03Z) - Can Small Language Models be Good Reasoners for Sequential Recommendation? [34.098264212413305]
Step-by-step knowLedge dIstillation fraMework for recommendation (SLIM)
We introduce CoT prompting based on user behavior sequences for the larger teacher model.
The rationales generated by the teacher model are then utilized as labels to distill the downstream smaller student model.
arXiv Detail & Related papers (2024-03-07T06:49:37Z) - ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation [43.270424225285105]
We focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks.
We propose Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-08-22T02:25:04Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - PALR: Personalization Aware LLMs for Recommendation [7.407353565043918]
PALR aims to combine user history behaviors (such as clicks, purchases, ratings, etc.) with large language models (LLMs) to generate user preferred items.
Our solution outperforms state-of-the-art models on various sequential recommendation tasks.
arXiv Detail & Related papers (2023-05-12T17:21:33Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - Is ChatGPT Good at Search? Investigating Large Language Models as
Re-Ranking Agents [56.104476412839944]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.
This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)
To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.
To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z) - Zero-Shot Next-Item Recommendation using Large Pretrained Language
Models [16.14557830316297]
We propose a prompting strategy called Zero-Shot Next-Item Recommendation (NIR) prompting that directs LLMs to make next-item recommendations.
Our strategy incorporates a 3-step prompting that guides GPT-3 to carry subtasks that capture the user's preferences.
We evaluate the proposed approach using GPT-3 on MovieLens 100K dataset and show that it achieves strong zero-shot performance.
arXiv Detail & Related papers (2023-04-06T15:35:11Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.