Related papers: LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems

LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems

URL: http://arxiv.org/abs/2511.04541v1
Date: Thu, 06 Nov 2025 16:54:54 GMT
Title: LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems
Authors: Baptiste Bonin, Maxime Heuillet, Audrey Durand,
Abstract summary: We investigate how Large Language Models (LLM) can act as world models of user preferences through pairwise reasoning over slates.<n>Our results reveal relationships between task performance and properties of the preference function captured by LLMs.
Score: 5.310303349822993
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling user preferences across domains remains a key challenge in slate recommendation (i.e. recommending an ordered sequence of items) research. We investigate how Large Language Models (LLM) can effectively act as world models of user preferences through pairwise reasoning over slates. We conduct an empirical study involving several LLMs on three tasks spanning different datasets. Our results reveal relationships between task performance and properties of the preference function captured by LLMs, hinting towards areas for improvement and highlighting the potential of LLMs as world models in recommender systems.

Related papers

Evaluating Position Bias in Large Language Model Recommendations [3.430780143519032]
Large Language Models (LLMs) are being increasingly explored as general-purpose tools for recommendation tasks.<n>We show that LLM-based recommendation models suffer from position bias, where the order of candidate items in a prompt can disproportionately influence the recommendations produced by LLMs.<n>We introduce a new prompting strategy to mitigate the position bias of LLM recommendation models called Ranking via Iterative SElection.
arXiv Detail & Related papers (2025-08-04T03:30:26Z)
Rethinking LLM-Based Recommendations: A Personalized Query-Driven Parallel Integration [22.650609670923732]
We propose a parallel recommendation framework that decouples large language models from candidate pre-selection.<n>Our framework connects LLMs and recommendation models in a parallel manner, allowing each component to independently utilize its strengths.
arXiv Detail & Related papers (2025-04-16T09:17:45Z)
From Selection to Generation: A Survey of LLM-based Active Learning [153.8110509961261]
Large Language Models (LLMs) have been employed for generating entirely new data instances and providing more cost-effective annotations.<n>This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques.
arXiv Detail & Related papers (2025-02-17T12:58:17Z)
Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data [54.3895971080712]
Fine-tuning large language models (LLMs) using diverse datasets is crucial for enhancing their overall performance across various domains.<n>We propose a new method that gives the LLM a dual identity: an output model to cognitively probe and select data based on diversity reward, as well as an input model to be tuned with the selected data.
arXiv Detail & Related papers (2025-02-05T17:21:01Z)
Large Language Model as Universal Retriever in Industrial-Scale Recommender System [27.58251380192748]
We show that Large Language Models (LLMs) can function as universal retrievers, capable of handling multiple objectives within a generative retrieval framework.<n>We also introduce matrix decomposition to boost model learnability, discriminability, and transferability.<n>Our Universal Retrieval Model (URM) can adaptively generate a set from computation of tens of millions of candidates.
arXiv Detail & Related papers (2025-02-05T09:56:52Z)
EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.<n>We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.<n>Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling [21.495443162191332]
Large Language Models (LLMs) have achieved remarkable success in various fields, prompting several studies to explore their potential in recommendation systems. We propose a novel Hierarchical Large Language Model (HLLM) architecture designed to enhance sequential recommendation systems. HLLM achieves excellent scalability, with the largest configuration utilizing 7B parameters for both item feature extraction and user interest modeling.
arXiv Detail & Related papers (2024-09-19T13:03:07Z)
Improving Sequential Recommendations with LLMs [8.819438328085925]
Large Language Models (LLMs) can be used to build or improve sequential recommendation approaches.<n>We conduct extensive experiments on three datasets to obtain a comprehensive picture of the performance of each approach.
arXiv Detail & Related papers (2024-02-02T11:52:07Z)
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis [91.5632751731927]
Large Language Models such as ChatGPT have showcased remarkable abilities in solving general tasks.<n>We propose a general framework for utilizing LLMs in recommendation tasks, focusing on the capabilities of LLMs as recommenders.<n>We analyze the impact of public availability, tuning strategies, model architecture, parameter scale, and context length on recommendation results.
arXiv Detail & Related papers (2024-01-10T08:28:56Z)
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools. InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z)
A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)
PALR: Personalization Aware LLMs for Recommendation [7.407353565043918]
PALR aims to combine user history behaviors (such as clicks, purchases, ratings, etc.) with large language models (LLMs) to generate user preferred items. Our solution outperforms state-of-the-art models on various sequential recommendation tasks.
arXiv Detail & Related papers (2023-05-12T17:21:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.