Sim2Rec: A Simulator-based Decision-making Approach to Optimize
Real-World Long-term User Engagement in Sequential Recommender Systems
- URL: http://arxiv.org/abs/2305.04832v1
- Date: Wed, 3 May 2023 19:21:25 GMT
- Title: Sim2Rec: A Simulator-based Decision-making Approach to Optimize
Real-World Long-term User Engagement in Sequential Recommender Systems
- Authors: Xiong-Hui Chen, Bowei He, Yang Yu, Qingyang Li, Zhiwei Qin, Wenjie
Shang, Jieping Ye, Chen Ma
- Abstract summary: Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is suited by reinforcement learning (RL)
RL has its shortcomings, particularly requiring a large number of online samples for exploration.
We present a simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec)
- Score: 43.31078296862647
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Long-term user engagement (LTE) optimization in sequential recommender
systems (SRS) is shown to be suited by reinforcement learning (RL) which finds
a policy to maximize long-term rewards. Meanwhile, RL has its shortcomings,
particularly requiring a large number of online samples for exploration, which
is risky in real-world applications. One of the appealing ways to avoid the
risk is to build a simulator and learn the optimal recommendation policy in the
simulator. In LTE optimization, the simulator is to simulate multiple users'
daily feedback for given recommendations. However, building a user simulator
with no reality-gap, i.e., can predict user's feedback exactly, is unrealistic
because the users' reaction patterns are complex and historical logs for each
user are limited, which might mislead the simulator-based recommendation
policy. In this paper, we present a practical simulator-based recommender
policy training approach, Simulation-to-Recommendation (Sim2Rec) to handle the
reality-gap problem for LTE optimization. Specifically, Sim2Rec introduces a
simulator set to generate various possibilities of user behavior patterns, then
trains an environment-parameter extractor to recognize users' behavior patterns
in the simulators. Finally, a context-aware policy is trained to make the
optimal decisions on all of the variants of the users based on the inferred
environment-parameters. The policy is transferable to unseen environments
(e.g., the real world) directly as it has learned to recognize all various user
behavior patterns and to make the correct decisions based on the inferred
environment-parameters. Experiments are conducted in synthetic environments and
a real-world large-scale ride-hailing platform, DidiChuxing. The results show
that Sim2Rec achieves significant performance improvement, and produces robust
recommendations in unseen environments.
Related papers
- LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots [20.715834172041763]
We propose a lifelong policy adaptation framework named LoopSR.
It reconstructs the real-world environments back in simulation for further improvement.
By leveraging the continual training, LoopSR achieves superior data efficiency compared with strong baselines.
arXiv Detail & Related papers (2024-09-26T16:02:25Z) - Algorithmic Drift: A Simulation Framework to Study the Effects of Recommender Systems on User Preferences [7.552217586057245]
We propose a simulation framework that mimics user-recommender system interactions in a long-term scenario.
We introduce two novel metrics for quantifying the algorithm's impact on user preferences, specifically in terms of drift over time.
arXiv Detail & Related papers (2024-09-24T21:54:22Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems [14.646529557978512]
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences.
Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities.
We introduce a Controllable, scalable, and human-Involved (CSHI) simulator framework that manages the behavior of user simulators.
arXiv Detail & Related papers (2024-05-13T03:02:56Z) - How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation [14.646529557978512]
We analyze the limitations of using Large Language Models in constructing user simulators for Conversational Recommender System.
Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results.
We propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items.
arXiv Detail & Related papers (2024-03-25T04:21:06Z) - Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation.
We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems.
We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z) - Auto-Tuned Sim-to-Real Transfer [143.44593793640814]
Policies trained in simulation often fail when transferred to the real world.
Current approaches to tackle this problem, such as domain randomization, require prior knowledge and engineering.
We propose a method for automatically tuning simulator system parameters to match the real world.
arXiv Detail & Related papers (2021-04-15T17:59:55Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z) - A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.