LLMRec: Benchmarking Large Language Models on Recommendation Task
- URL: http://arxiv.org/abs/2308.12241v1
- Date: Wed, 23 Aug 2023 16:32:54 GMT
- Title: LLMRec: Benchmarking Large Language Models on Recommendation Task
- Authors: Junling Liu, Chao Liu, Peilin Zhou, Qichen Ye, Dading Chong, Kang
Zhou, Yueqi Xie, Yuwei Cao, Shoujin Wang, Chenyu You, Philip S.Yu
- Abstract summary: The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
- Score: 54.48899723591296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the fast development of Large Language Models (LLMs) such as
ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of
conversational models. However, the application of LLMs in the recommendation
domain has not been thoroughly investigated. To bridge this gap, we propose
LLMRec, a LLM-based recommender system designed for benchmarking LLMs on
various recommendation tasks. Specifically, we benchmark several popular
off-the-shelf LLMs, such as ChatGPT, LLaMA, ChatGLM, on five recommendation
tasks, including rating prediction, sequential recommendation, direct
recommendation, explanation generation, and review summarization. Furthermore,
we investigate the effectiveness of supervised finetuning to improve LLMs'
instruction compliance ability. The benchmark results indicate that LLMs
displayed only moderate proficiency in accuracy-based tasks such as sequential
and direct recommendation. However, they demonstrated comparable performance to
state-of-the-art methods in explainability-based tasks. We also conduct
qualitative evaluations to further evaluate the quality of contents generated
by different models, and the results show that LLMs can truly understand the
provided information and generate clearer and more reasonable results. We
aspire that this benchmark will serve as an inspiration for researchers to
delve deeper into the potential of LLMs in enhancing recommendation
performance. Our codes, processed data and benchmark results are available at
https://github.com/williamliujl/LLMRec.
Related papers
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism [39.392450788666814]
Current evaluations of large language models (LLMs) often overlook non-determinism.
greedy decoding generally outperforms sampling methods for most evaluated tasks.
Smaller LLMs can match or surpass larger models such as GPT-4-Turbo.
arXiv Detail & Related papers (2024-07-15T06:12:17Z) - MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization [73.7779735046424]
We show that different prompts should be adapted to different Large Language Models (LLM) to enhance their capabilities across various downstream tasks in NLP.
We then propose a model-adaptive prompt (MAPO) method that optimize the original prompts for each specific LLM in downstream tasks.
arXiv Detail & Related papers (2024-07-04T18:39:59Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond [24.151927600694066]
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs.
This paper conducts the first comprehensive experiment to investigate how far we have been in applying Large Language Models (LLMs) to generate high-quality commit messages.
arXiv Detail & Related papers (2024-04-23T08:24:43Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.