Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems
- URL: http://arxiv.org/abs/2405.13362v3
- Date: Fri, 27 Dec 2024 14:44:30 GMT
- Title: Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems
- Authors: Danial Ebrat, Eli Paradalis, Luis Rueda,
- Abstract summary: We introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback.
Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items.
Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3.
- Score: 0.0
- License:
- Abstract: Training reinforcement learning-based recommender systems is often hindered by the lack of dynamic and realistic user interactions. To address this limitation, we introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback. Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items, with profiles updated after each rating to reflect evolving user characteristics. Utilizing the MovieLens dataset as a proof of concept, we limited our implementation to the last 40 interactions for each user, representing approximately 39% and 22% of the training sets, to focus on recent user behavior. For consistency and to gain insights into the performance of traditional methods with limited data, we implemented baseline approaches using the same data subset. Our results demonstrate that Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3 across various test sets. This paper presents Lusifer's operational pipeline, including prompt generation and iterative user profile updates, and compares its performance against baseline methods. The findings validate Lusifer's ability to produce realistic dynamic feedback and suggest that it offers a scalable and adjustable framework for user simulation in online reinforcement learning recommender systems for future studies, particularly when training data is limited.
Related papers
- Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions.
For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z) - LLM-Powered User Simulator for Recommender System [29.328839982869923]
We introduce an LLM-powered user simulator to simulate user engagement with items in an explicit manner.
Specifically, we identify the explicit logic of user preferences, leverage LLMs to analyze item characteristics and distill user sentiments.
We propose an ensemble model that synergizes logical and statistical insights for user interaction simulations.
arXiv Detail & Related papers (2024-12-22T12:00:04Z) - LIBER: Lifelong User Behavior Modeling Based on Large Language Models [42.045535303737694]
We propose Lifelong User Behavior Modeling (LIBER) based on large language models.
LIBER has been deployed on Huawei's music recommendation service and achieved substantial improvements in users' play count and play time by 3.01% and 7.69%.
arXiv Detail & Related papers (2024-11-22T03:43:41Z) - Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy Learning [5.453444582931813]
Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS)
This paper introduces the Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS)
arXiv Detail & Related papers (2024-08-13T10:58:29Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems [14.646529557978512]
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences.
Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities.
We introduce a Controllable, scalable, and human-Involved (CSHI) simulator framework that manages the behavior of user simulators.
arXiv Detail & Related papers (2024-05-13T03:02:56Z) - Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform.
Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online.
Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z) - Personalizing Intervened Network for Long-tailed Sequential User
Behavior Modeling [66.02953670238647]
Tail users suffer from significantly lower-quality recommendation than the head users after joint training.
A model trained on tail users separately still achieve inferior results due to limited data.
We propose a novel approach that significantly improves the recommendation performance of the tail users.
arXiv Detail & Related papers (2022-08-19T02:50:19Z) - WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation
Models [24.455665093145818]
We propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, intrinsic and fine-tuning.
WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolving the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning.
arXiv Detail & Related papers (2022-02-28T08:55:12Z) - Learning to Learn a Cold-start Sequential Recommender [70.5692886883067]
The cold-start recommendation is an urgent problem in contemporary online applications.
We propose a meta-learning based cold-start sequential recommendation framework called metaCSR.
metaCSR holds the ability to learn the common patterns from regular users' behaviors.
arXiv Detail & Related papers (2021-10-18T08:11:24Z) - Learning Transferrable Parameters for Long-tailed Sequential User
Behavior Modeling [70.64257515361972]
We argue that focusing on tail users could bring more benefits and address the long tails issue.
Specifically, we propose a gradient alignment and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail.
arXiv Detail & Related papers (2020-10-22T03:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.