Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems
- URL: http://arxiv.org/abs/2405.13362v4
- Date: Sat, 29 Mar 2025 14:45:21 GMT
- Title: Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems
- Authors: Danial Ebrat, Eli Paradalis, Luis Rueda,
- Abstract summary: Reinforcement learning (RL) recommender systems often rely on static datasets that fail to capture the fluid, ever changing nature of user preferences in real-world scenarios.<n>We introduce Lusifer, an LLM-based simulation environment designed to generate dynamic, realistic user feedback for RL-based recommender training.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) recommender systems often rely on static datasets that fail to capture the fluid, ever changing nature of user preferences in real-world scenarios. Meanwhile, generative AI techniques have emerged as powerful tools for creating synthetic data, including user profiles and behaviors. Recognizing this potential, we introduce Lusifer, an LLM-based simulation environment designed to generate dynamic, realistic user feedback for RL-based recommender training. In Lusifer, user profiles are incrementally updated at each interaction step, with Large Language Models (LLMs) providing transparent explanations of how and why preferences evolve. We focus on the MovieLens dataset, extracting only the last 40 interactions for each user, to emphasize recent behavior. By processing textual metadata (such as movie overviews and tags) Lusifer creates more context aware user states and simulates feedback on new items, including those with limited or no prior ratings. This approach reduces reliance on extensive historical data and facilitates cold start scenario handling and adaptation to out of distribution cases. Our experiments compare Lusifer with traditional collaborative filtering models, revealing that while Lusifer can be comparable in predictive accuracy, it excels at capturing dynamic user responses and yielding explainable results at every step. These qualities highlight its potential as a scalable, ethically sound alternative to live user experiments, supporting iterative and user-centric evaluations of RL-based recommender strategies. Looking ahead, we envision Lusifer serving as a foundational tool for exploring generative AI-driven user simulations, enabling more adaptive and personalized recommendation pipelines under real world constraints.
Related papers
- Large Language Model Empowered Recommendation Meets All-domain Continual Pre-Training [60.38082979765664]
CPRec is an All-domain Continual Pre-Training framework for Recommendation.
It holistically align LLMs with universal user behaviors through the continual pre-training paradigm.
We conduct experiments on five real-world datasets from two distinct platforms.
arXiv Detail & Related papers (2025-04-11T20:01:25Z) - FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users [111.56469697145519]
We propose Few-Shot Preference Optimization, which reframes reward modeling as a meta-learning problem.
Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them.
We generate over 1M synthetic personalized preferences using publicly available LLMs.
We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study.
arXiv Detail & Related papers (2025-02-26T17:08:46Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization [3.1944843830667766]
Large language models (LLMs) have revolutionized how we interact with technology, but their personalization to individual user preferences remains a significant challenge.
We present Adaptive Self-Supervised Learning Strategies (ASLS), which utilize self-supervised learning techniques to personalize LLMs dynamically.
arXiv Detail & Related papers (2024-09-25T14:35:06Z) - WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback [28.317315761271804]
We introduce WildFeedback, a novel framework that leverages real-time, in-situ user interactions to create preference datasets that more accurately reflect authentic human values.
We apply this framework to a large corpus of user-LLM conversations, resulting in a rich preference dataset that reflects genuine user preferences.
Our experiments demonstrate that LLMs fine-tuned on WildFeedback exhibit significantly improved alignment with user preferences.
arXiv Detail & Related papers (2024-08-28T05:53:46Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Look into the Future: Deep Contextualized Sequential Recommendation [28.726897673576865]
We propose a novel framework of sequential recommendation called Look into the Future (LIFT)
LIFT builds and leverages the contexts of sequential recommendation.
In our experiments, LIFT achieves significant performance improvement on click-through rate prediction and rating prediction tasks.
arXiv Detail & Related papers (2024-05-23T09:34:28Z) - A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems [14.646529557978512]
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences.
Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities.
We introduce a Controllable, scalable, and human-Involved (CSHI) simulator framework that manages the behavior of user simulators.
arXiv Detail & Related papers (2024-05-13T03:02:56Z) - Learning Social Graph for Inactive User Recommendation [50.090904659803854]
LSIR learns an optimal social graph structure for social recommendation, especially for inactive users.
Experiments on real-world datasets demonstrate that LSIR achieves significant improvements of up to 129.58% on NDCG in inactive user recommendation.
arXiv Detail & Related papers (2024-05-08T03:40:36Z) - How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation [14.646529557978512]
We analyze the limitations of using Large Language Models in constructing user simulators for Conversational Recommender System.
Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results.
We propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items.
arXiv Detail & Related papers (2024-03-25T04:21:06Z) - Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs)
In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol.
We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z) - Sim2Rec: A Simulator-based Decision-making Approach to Optimize
Real-World Long-term User Engagement in Sequential Recommender Systems [43.31078296862647]
Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is suited by reinforcement learning (RL)
RL has its shortcomings, particularly requiring a large number of online samples for exploration.
We present a simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec)
arXiv Detail & Related papers (2023-05-03T19:21:25Z) - PUNR: Pre-training with User Behavior Modeling for News Recommendation [26.349183393252115]
News recommendation aims to predict click behaviors based on user behaviors.
How to effectively model the user representations is the key to recommending preferred news.
We propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation.
arXiv Detail & Related papers (2023-04-25T08:03:52Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - Intent Contrastive Learning for Sequential Recommendation [86.54439927038968]
We introduce a latent variable to represent users' intents and learn the distribution function of the latent variable via clustering.
We propose to leverage the learned intents into SR models via contrastive SSL, which maximizes the agreement between a view of sequence and its corresponding intent.
Experiments conducted on four real-world datasets demonstrate the superiority of the proposed learning paradigm.
arXiv Detail & Related papers (2022-02-05T09:24:13Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - User Memory Reasoning for Conversational Recommendation [68.34475157544246]
We study a conversational recommendation model which dynamically manages users' past (offline) preferences and current (online) requests.
MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation.
arXiv Detail & Related papers (2020-05-30T05:29:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.