Evaluating Position Bias in Large Language Model Recommendations
- URL: http://arxiv.org/abs/2508.02020v1
- Date: Mon, 04 Aug 2025 03:30:26 GMT
- Title: Evaluating Position Bias in Large Language Model Recommendations
- Authors: Ethan Bito, Yongli Ren, Estrid He,
- Abstract summary: Large Language Models (LLMs) are being increasingly explored as general-purpose tools for recommendation tasks.<n>We show that LLM-based recommendation models suffer from position bias, where the order of candidate items in a prompt can disproportionately influence the recommendations produced by LLMs.<n>We introduce a new prompting strategy to mitigate the position bias of LLM recommendation models called Ranking via Iterative SElection.
- Score: 3.430780143519032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are being increasingly explored as general-purpose tools for recommendation tasks, enabling zero-shot and instruction-following capabilities without the need for task-specific training. While the research community is enthusiastically embracing LLMs, there are important caveats to directly adapting them for recommendation tasks. In this paper, we show that LLM-based recommendation models suffer from position bias, where the order of candidate items in a prompt can disproportionately influence the recommendations produced by LLMs. First, we analyse the position bias of LLM-based recommendations on real-world datasets, where results uncover systemic biases of LLMs with high sensitivity to input orders. Furthermore, we introduce a new prompting strategy to mitigate the position bias of LLM recommendation models called Ranking via Iterative SElection (RISE). We compare our proposed method against various baselines on key benchmark datasets. Experiment results show that our method reduces sensitivity to input ordering and improves stability without requiring model fine-tuning or post-processing.
Related papers
- DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation [83.21140655248624]
Large language models (LLMs) have been introduced into recommender systems (RSs)<n>We propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space.<n> Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines.
arXiv Detail & Related papers (2025-05-22T15:49:38Z) - Prompt-Based LLMs for Position Bias-Aware Reranking in Personalized Recommendations [0.0]
Large language models (LLMs) have been adopted for prompt-based recommendation.<n>LLMs face limitations such as limited context window size, inefficient pointwise and pairwise prompting, and difficulty handling listwise ranking.<n>We propose a hybrid framework that combines a traditional recommendation model with an LLM for reranking top-k items using structured prompts.
arXiv Detail & Related papers (2025-05-08T05:01:44Z) - Direct Preference Optimization for LLM-Enhanced Recommendation Systems [33.54698201942643]
Large Language Models (LLMs) have exhibited remarkable performance across a wide range of domains.<n>We propose DPO4Rec, a framework that integrates DPO into LLM-enhanced recommendation systems.<n>Extensive experiments show that DPO4Rec significantly improves re-ranking performance over strong baselines.
arXiv Detail & Related papers (2024-10-08T11:42:37Z) - On Softmax Direct Preference Optimization for Recommendation [50.896117978746]
We propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives.
Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders.
arXiv Detail & Related papers (2024-06-13T15:16:11Z) - Improve Temporal Awareness of LLMs for Sequential Recommendation [61.723928508200196]
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks.
LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data.
We propose three prompting strategies to exploit temporal information within historical interactions for LLM-based sequential recommendation.
arXiv Detail & Related papers (2024-05-05T00:21:26Z) - Make Large Language Model a Better Ranker [20.532118635672763]
This paper introduces the large language model framework with Aligned Listwise Ranking Objectives (ALRO)
ALRO is designed to bridge the gap between the capabilities of LLMs and nuanced requirements of ranking tasks.
Our evaluative studies reveal that ALRO outperforms both existing embedding-based recommendation methods and LLM-based recommendation baselines.
arXiv Detail & Related papers (2024-03-28T07:22:16Z) - Large Language Models are Not Stable Recommender Systems [45.941176155464824]
We introduce exploratory research and find consistent patterns of positional bias in large language models (LLMs)
We propose a Bayesian probabilistic framework, STELLA (Stable LLM for Recommendation), which involves a two-stage pipeline.
Our framework can capitalize on existing pattern information to calibrate instability of LLMs, and enhance recommendation performance.
arXiv Detail & Related papers (2023-12-25T14:54:33Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z) - A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP)
This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.