Large Language Models are Not Stable Recommender Systems
- URL: http://arxiv.org/abs/2312.15746v1
- Date: Mon, 25 Dec 2023 14:54:33 GMT
- Title: Large Language Models are Not Stable Recommender Systems
- Authors: Tianhui Ma, Yuan Cheng, Hengshu Zhu, Hui Xiong
- Abstract summary: We introduce exploratory research and find consistent patterns of positional bias in large language models (LLMs)
We propose a Bayesian probabilistic framework, STELLA (Stable LLM for Recommendation), which involves a two-stage pipeline.
Our framework can capitalize on existing pattern information to calibrate instability of LLMs, and enhance recommendation performance.
- Score: 45.941176155464824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the significant successes of large language models (LLMs) in many
natural language processing tasks, there is growing interest among researchers
in exploring LLMs for novel recommender systems. However, we have observed that
directly using LLMs as a recommender system is usually unstable due to its
inherent position bias. To this end, we introduce exploratory research and find
consistent patterns of positional bias in LLMs that influence the performance
of recommendation across a range of scenarios. Then, we propose a Bayesian
probabilistic framework, STELLA (Stable LLM for Recommendation), which involves
a two-stage pipeline. During the first probing stage, we identify patterns in a
transition matrix using a probing detection dataset. And in the second
recommendation stage, a Bayesian strategy is employed to adjust the biased
output of LLMs with an entropy indicator. Therefore, our framework can
capitalize on existing pattern information to calibrate instability of LLMs,
and enhance recommendation performance. Finally, extensive experiments clearly
validate the effectiveness of our framework.
Related papers
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [90.4820014819937]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when finetuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, SELM significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - SLMRec: Empowering Small Language Models for Sequential Recommendation [25.920216777752]
Sequential Recommendation task involves predicting the next item a user is likely to interact with.
Recent research demonstrates the great impact of LLMs on sequential recommendation systems.
Due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms.
arXiv Detail & Related papers (2024-05-28T07:12:06Z) - CALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation [18.986613405565514]
We propose a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion.
Our model significantly outperforms many state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-03T18:51:19Z) - Empowering Few-Shot Recommender Systems with Large Language Models --
Enhanced Representations [0.0]
Large language models (LLMs) offer novel insights into tackling the few-shot scenarios encountered by explicit feedback-based recommender systems.
Our study can inspire researchers to delve deeper into the multifaceted dimensions of LLMs's involvement in recommender systems.
arXiv Detail & Related papers (2023-12-21T03:50:09Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.