Related papers: Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders

Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders

URL: http://arxiv.org/abs/2503.05493v1
Date: Fri, 07 Mar 2025 15:05:23 GMT
Title: Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders
Authors: Qijiong Liu, Jieming Zhu, Lu Fan, Kun Wang, Hengchang Hu, Wei Guo, Yong Liu, Xiao-Ming Wu,
Abstract summary: We introduce RecBench, which evaluates two primary recommendation tasks, i.e., click-through rate prediction (CTR) and sequential recommendation (SeqRec)<n>Our experiments cover up to 17 large models and are conducted across five diverse datasets from fashion, news, video, books, and music domains.<n>Our findings indicate that LLM-based recommenders outperform conventional recommenders, achieving up to a 5% AUC improvement in the CTR scenario and up to a 170% NDCG@10 improvement in the SeqRec scenario.
Score: 27.273217543282215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, integrating large language models (LLMs) into recommender systems has created new opportunities for improving recommendation quality. However, a comprehensive benchmark is needed to thoroughly evaluate and compare the recommendation capabilities of LLMs with traditional recommender systems. In this paper, we introduce RecBench, which systematically investigates various item representation forms (including unique identifier, text, semantic embedding, and semantic identifier) and evaluates two primary recommendation tasks, i.e., click-through rate prediction (CTR) and sequential recommendation (SeqRec). Our extensive experiments cover up to 17 large models and are conducted across five diverse datasets from fashion, news, video, books, and music domains. Our findings indicate that LLM-based recommenders outperform conventional recommenders, achieving up to a 5% AUC improvement in the CTR scenario and up to a 170% NDCG@10 improvement in the SeqRec scenario. However, these substantial performance gains come at the expense of significantly reduced inference efficiency, rendering the LLM-as-RS paradigm impractical for real-time recommendation environments. We aim for our findings to inspire future research, including recommendation-specific model acceleration methods. We will release our code, data, configurations, and platform to enable other researchers to reproduce and build upon our experimental results.

Related papers

ExplainRec: Towards Explainable Multi-Modal Zero-Shot Recommendation with Preference Attribution and Large Language Models [1.7062863339678127]
We present ExplainRec, a framework that extends Large Language Models (LLMs)-based recommendation capabilities.<n>The framework incorporates preference attribution tuning for explainable recommendations, zero-shot preference transfer for cold-start users and items, multi-modal enhancement leveraging visual and textual content, and multi-task collaborative optimization.
arXiv Detail & Related papers (2025-10-03T02:09:41Z)
OneRec Technical Report [75.69742949746596]
We propose OneRec, which reshapes the recommendation system through an end-to-end generative approach.<n>Firstly, we have enhanced the computational FLOPs of the current recommendation model by 10 $times$ and have identified the scaling laws for recommendations within certain boundaries.<n> Secondly, reinforcement learning techniques, previously difficult to apply for optimizing recommendations, show significant potential in this framework.
arXiv Detail & Related papers (2025-06-16T16:58:55Z)
$\ ext{R}^2\ ext{ec}$: Towards Large Recommender Models with Reasoning [50.291998724376654]
We propose name, a unified large recommender model with intrinsic reasoning capabilities.<n> RecPO is a corresponding reinforcement learning framework that optimize name both the reasoning and recommendation capabilities simultaneously in a single policy update.<n> Experiments on three datasets with various baselines verify the effectiveness of name, showing relative improvements of 68.67% in Hit@5 and 45.21% in NDCG@20.
arXiv Detail & Related papers (2025-05-22T17:55:43Z)
DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation [83.21140655248624]
Large language models (LLMs) have been introduced into recommender systems (RSs)<n>We propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space.<n> Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines.
arXiv Detail & Related papers (2025-05-22T15:49:38Z)
Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark [54.93461228053298]
We introduce our benchmark, textbfScenario-Wise Rec, which comprises 6 public datasets and 12 benchmark models, along with a training and evaluation pipeline.<n>We aim for this benchmark to offer researchers valuable insights from prior work, enabling the development of novel models.
arXiv Detail & Related papers (2024-12-23T08:15:34Z)
Towards Scalable Semantic Representation for Recommendation [65.06144407288127]
Mixture-of-Codes is proposed to construct semantic IDs based on large language models (LLMs) Our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
arXiv Detail & Related papers (2024-10-12T15:10:56Z)
RLRF4Rec: Reinforcement Learning from Recsys Feedback for Enhanced Recommendation Reranking [33.54698201942643]
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. This paper introduces RLRF4Rec, a novel framework integrating Reinforcement Learning from Recsys Feedback for Enhanced Recommendation Reranking.
arXiv Detail & Related papers (2024-10-08T11:42:37Z)
Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z)
DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System [83.34921966305804]
Large language models (LLMs) have demonstrated remarkable performance in recommender systems. We propose a novel plug-and-play alignment framework for LLMs and collaborative models. Our method is superior to existing state-of-the-art algorithms.
arXiv Detail & Related papers (2024-08-15T15:56:23Z)
LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation [52.55639178180821]
The study on multi-scenario recommendation (MSR) has attracted much attention, which uses the data from all scenarios to simultaneously improve their recommendation performance.<n>Existing methods tend to integrate insufficient scenario knowledge and neglect learning personalized cross-scenario preferences, thus leading to sub-optimal performance.<n>We propose a large language model (LLM)-enhanced paradigm LLM4MSR to fill these gaps.
arXiv Detail & Related papers (2024-06-18T11:59:36Z)
CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework [4.4206696279087]
We propose a framework for news recommendation using Large Language Models (LLMs) named textitCherryRec. CherryRec ensures the quality of recommendations while accelerating the recommendation process. We validate the effectiveness of the proposed framework by comparing it with state-of-the-art baseline methods on benchmark datasets.
arXiv Detail & Related papers (2024-06-18T03:33:38Z)
Finetuning Large Language Model for Personalized Ranking [12.16551080986962]
Large Language Models (LLMs) have demonstrated remarkable performance across various domains. Direct Multi-Preference Optimization (DMPO) is a framework designed to bridge the gap and enhance the alignment of LLMs for recommendation tasks.
arXiv Detail & Related papers (2024-05-25T08:36:15Z)
Improving Sequential Recommendations with LLMs [8.819438328085925]
Large Language Models (LLMs) can be used to build or improve sequential recommendation approaches.<n>We conduct extensive experiments on three datasets to obtain a comprehensive picture of the performance of each approach.
arXiv Detail & Related papers (2024-02-02T11:52:07Z)
LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking [10.671747198171136]
We propose a two-stage framework using large language models for ranking-based recommendation (LlamaRec) In particular, we use small-scale sequential recommenders to retrieve candidates based on the user interaction history. LlamaRec consistently achieves datasets superior performance in both recommendation performance and efficiency.
arXiv Detail & Related papers (2023-10-25T06:23:48Z)
LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated. We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization. The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z)
A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.