CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry
- URL: http://arxiv.org/abs/2502.07307v1
- Date: Tue, 11 Feb 2025 07:09:49 GMT
- Title: CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry
- Authors: Xiaopeng Ye, Chen Xu, Zhongxiang Sun, Jun Xu, Gang Wang, Zhenhua Dong, Ji-Rong Wen,
- Abstract summary: We propose CreAgent, a large language model-empowered creator simulation agent.<n>By incorporating game theory's belief mechanism and the fast-and-slow thinking framework, CreAgent effectively simulates creator behavior.<n>Our credibility validation experiments show that CreAgent aligns well with the behaviors between real-world platform and creator.
- Score: 55.64992650205645
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Ensuring the long-term sustainability of recommender systems (RS) emerges as a crucial issue. Traditional offline evaluation methods for RS typically focus on immediate user feedback, such as clicks, but they often neglect the long-term impact of content creators. On real-world content platforms, creators can strategically produce and upload new items based on user feedback and preference trends. While previous studies have attempted to model creator behavior, they often overlook the role of information asymmetry. This asymmetry arises because creators primarily have access to feedback on the items they produce, while platforms possess data on the entire spectrum of user feedback. Current RS simulators, however, fail to account for this asymmetry, leading to inaccurate long-term evaluations. To address this gap, we propose CreAgent, a Large Language Model (LLM)-empowered creator simulation agent. By incorporating game theory's belief mechanism and the fast-and-slow thinking framework, CreAgent effectively simulates creator behavior under conditions of information asymmetry. Additionally, we enhance CreAgent's simulation ability by fine-tuning it using Proximal Policy Optimization (PPO). Our credibility validation experiments show that CreAgent aligns well with the behaviors between real-world platform and creator, thus improving the reliability of long-term RS evaluations. Moreover, through the simulation of RS involving CreAgents, we can explore how fairness- and diversity-aware RS algorithms contribute to better long-term performance for various stakeholders. CreAgent and the simulation platform are publicly available at https://github.com/shawnye2000/CreAgent.
Related papers
- PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - PUB: An LLM-Enhanced Personality-Driven User Behaviour Simulator for Recommender System Evaluation [9.841963696576546]
Personality-driven User Behaviour Simulator (PUB) integrates the Big Five personality traits to model personalised user behaviour.<n>PUB dynamically infers user personality from behavioural logs (e.g., ratings, reviews) and item metadata, then generates synthetic interactions that preserve statistical fidelity to real-world data.<n> Experiments on the Amazon review datasets show that logs generated by PUB closely align with real user behaviour and reveal meaningful associations between personality traits and recommendation outcomes.
arXiv Detail & Related papers (2025-06-05T01:57:36Z) - Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems [40.09105175322562]
RecInter is a novel agent-based simulation platform for recommender systems.<n>In RecInter, simulated user actions (e.g., likes, reviews, purchases) dynamically update item attributes in real-time.<n> Merchant Agents can reply, fostering a more realistic and evolving ecosystem.
arXiv Detail & Related papers (2025-05-22T09:14:23Z) - SimUSER: Simulating User Behavior with Large Language Models for Recommender System Evaluation [1.2430809884830318]
We introduce Sim, an agent framework that serves as believable and cost-effective human proxies.
Sim identifies self-consistent personas from historical data, enriching user profiles with unique backgrounds and personalities.
We conduct experiments to explore the effects of thumbnails on click rates, the exposure effect, and the impact of reviews on user engagement.
arXiv Detail & Related papers (2025-04-17T07:57:23Z) - Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance.
We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.
We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.
We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors.
Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z) - Sim2Rec: A Simulator-based Decision-making Approach to Optimize
Real-World Long-term User Engagement in Sequential Recommender Systems [43.31078296862647]
Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is suited by reinforcement learning (RL)
RL has its shortcomings, particularly requiring a large number of online samples for exploration.
We present a simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec)
arXiv Detail & Related papers (2023-05-03T19:21:25Z) - Sim-Anchored Learning for On-the-Fly Adaptation [45.123633153460034]
Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions.
We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality.
arXiv Detail & Related papers (2023-01-17T16:16:53Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation.
We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems.
We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.