Related papers: Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems

Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems

URL: http://arxiv.org/abs/2405.13362v3
Date: Fri, 27 Dec 2024 14:44:30 GMT
Title: Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems
Authors: Danial Ebrat, Eli Paradalis, Luis Rueda,
Abstract summary: We introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback.<n>Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items.<n>Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training reinforcement learning-based recommender systems is often hindered by the lack of dynamic and realistic user interactions. To address this limitation, we introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback. Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items, with profiles updated after each rating to reflect evolving user characteristics. Utilizing the MovieLens dataset as a proof of concept, we limited our implementation to the last 40 interactions for each user, representing approximately 39% and 22% of the training sets, to focus on recent user behavior. For consistency and to gain insights into the performance of traditional methods with limited data, we implemented baseline approaches using the same data subset. Our results demonstrate that Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3 across various test sets. This paper presents Lusifer's operational pipeline, including prompt generation and iterative user profile updates, and compares its performance against baseline methods. The findings validate Lusifer's ability to produce realistic dynamic feedback and suggest that it offers a scalable and adjustable framework for user simulation in online reinforcement learning recommender systems for future studies, particularly when training data is limited.

Related papers

Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries [13.187789731783095]
We present a novel framework that learns text-based summaries of each user's preferences, characteristics, and past conversations.<n>These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user.<n>We show that our method is robust to new users and diverse conversation topics.
arXiv Detail & Related papers (2025-07-17T23:48:51Z)
PUB: An LLM-Enhanced Personality-Driven User Behaviour Simulator for Recommender System Evaluation [9.841963696576546]
Personality-driven User Behaviour Simulator (PUB) integrates the Big Five personality traits to model personalised user behaviour.<n>PUB dynamically infers user personality from behavioural logs (e.g., ratings, reviews) and item metadata, then generates synthetic interactions that preserve statistical fidelity to real-world data.<n> Experiments on the Amazon review datasets show that logs generated by PUB closely align with real user behaviour and reveal meaningful associations between personality traits and recommendation outcomes.
arXiv Detail & Related papers (2025-06-05T01:57:36Z)
Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z)
Large Language Model Empowered Recommendation Meets All-domain Continual Pre-Training [60.38082979765664]
CPRec is an All-domain Continual Pre-Training framework for Recommendation. It holistically align LLMs with universal user behaviors through the continual pre-training paradigm. We conduct experiments on five real-world datasets from two distinct platforms.
arXiv Detail & Related papers (2025-04-11T20:01:25Z)
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users [111.56469697145519]
We propose Few-Shot Preference Optimization, which reframes reward modeling as a meta-learning problem. Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them. We generate over 1M synthetic personalized preferences using publicly available LLMs. We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study.
arXiv Detail & Related papers (2025-02-26T17:08:46Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization [3.1944843830667766]
Large language models (LLMs) have revolutionized how we interact with technology, but their personalization to individual user preferences remains a significant challenge. We present Adaptive Self-Supervised Learning Strategies (ASLS), which utilize self-supervised learning techniques to personalize LLMs dynamically.
arXiv Detail & Related papers (2024-09-25T14:35:06Z)
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback [28.317315761271804]
We introduce WildFeedback, a novel framework that leverages real-time, in-situ user interactions to create preference datasets that more accurately reflect authentic human values. We apply this framework to a large corpus of user-LLM conversations, resulting in a rich preference dataset that reflects genuine user preferences. Our experiments demonstrate that LLMs fine-tuned on WildFeedback exhibit significantly improved alignment with user preferences.
arXiv Detail & Related papers (2024-08-28T05:53:46Z)
Flexible Generation of Preference Data for Recommendation Analysis [1.384948712833979]
HYDRA is a novel data generation model driven by three main factors: user-item interaction level, item popularity, and user engagement level.<n>We demonstrate the effectiveness of HYDRA through extensive experiments on well-known benchmark datasets.
arXiv Detail & Related papers (2024-07-23T15:53:17Z)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
Look into the Future: Deep Contextualized Sequential Recommendation [28.726897673576865]
We propose a novel framework of sequential recommendation called Look into the Future (LIFT) LIFT builds and leverages the contexts of sequential recommendation. In our experiments, LIFT achieves significant performance improvement on click-through rate prediction and rating prediction tasks.
arXiv Detail & Related papers (2024-05-23T09:34:28Z)
A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems [14.646529557978512]
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences. Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities. We introduce a Controllable, scalable, and human-Involved (CSHI) simulator framework that manages the behavior of user simulators.
arXiv Detail & Related papers (2024-05-13T03:02:56Z)
Learning Social Graph for Inactive User Recommendation [50.090904659803854]
LSIR learns an optimal social graph structure for social recommendation, especially for inactive users. Experiments on real-world datasets demonstrate that LSIR achieves significant improvements of up to 129.58% on NDCG in inactive user recommendation.
arXiv Detail & Related papers (2024-05-08T03:40:36Z)
How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation [14.646529557978512]
We analyze the limitations of using Large Language Models in constructing user simulators for Conversational Recommender System. Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results. We propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items.
arXiv Detail & Related papers (2024-03-25T04:21:06Z)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs) In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol. We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z)
Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems [43.31078296862647]
Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is suited by reinforcement learning (RL) RL has its shortcomings, particularly requiring a large number of online samples for exploration. We present a simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec)
arXiv Detail & Related papers (2023-05-03T19:21:25Z)
PUNR: Pre-training with User Behavior Modeling for News Recommendation [26.349183393252115]
News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. We propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation.
arXiv Detail & Related papers (2023-04-25T08:03:52Z)
Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z)
Simulating Bandit Learning from User Feedback for Extractive Question Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z)
Intent Contrastive Learning for Sequential Recommendation [86.54439927038968]
We introduce a latent variable to represent users' intents and learn the distribution function of the latent variable via clustering. We propose to leverage the learned intents into SR models via contrastive SSL, which maximizes the agreement between a view of sequence and its corresponding intent. Experiments conducted on four real-world datasets demonstrate the superiority of the proposed learning paradigm.
arXiv Detail & Related papers (2022-02-05T09:24:13Z)
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
User Memory Reasoning for Conversational Recommendation [68.34475157544246]
We study a conversational recommendation model which dynamically manages users' past (offline) preferences and current (online) requests. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation.
arXiv Detail & Related papers (2020-05-30T05:29:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.