CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry
- URL: http://arxiv.org/abs/2502.07307v1
- Date: Tue, 11 Feb 2025 07:09:49 GMT
- Title: CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry
- Authors: Xiaopeng Ye, Chen Xu, Zhongxiang Sun, Jun Xu, Gang Wang, Zhenhua Dong, Ji-Rong Wen,
- Abstract summary: We propose CreAgent, a large language model-empowered creator simulation agent.<n>By incorporating game theory's belief mechanism and the fast-and-slow thinking framework, CreAgent effectively simulates creator behavior.<n>Our credibility validation experiments show that CreAgent aligns well with the behaviors between real-world platform and creator.
- Score: 55.64992650205645
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Ensuring the long-term sustainability of recommender systems (RS) emerges as a crucial issue. Traditional offline evaluation methods for RS typically focus on immediate user feedback, such as clicks, but they often neglect the long-term impact of content creators. On real-world content platforms, creators can strategically produce and upload new items based on user feedback and preference trends. While previous studies have attempted to model creator behavior, they often overlook the role of information asymmetry. This asymmetry arises because creators primarily have access to feedback on the items they produce, while platforms possess data on the entire spectrum of user feedback. Current RS simulators, however, fail to account for this asymmetry, leading to inaccurate long-term evaluations. To address this gap, we propose CreAgent, a Large Language Model (LLM)-empowered creator simulation agent. By incorporating game theory's belief mechanism and the fast-and-slow thinking framework, CreAgent effectively simulates creator behavior under conditions of information asymmetry. Additionally, we enhance CreAgent's simulation ability by fine-tuning it using Proximal Policy Optimization (PPO). Our credibility validation experiments show that CreAgent aligns well with the behaviors between real-world platform and creator, thus improving the reliability of long-term RS evaluations. Moreover, through the simulation of RS involving CreAgents, we can explore how fairness- and diversity-aware RS algorithms contribute to better long-term performance for various stakeholders. CreAgent and the simulation platform are publicly available at https://github.com/shawnye2000/CreAgent.
Related papers
- ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders [48.83868690303791]
We introduce ConvApparel, a new dataset of human-AI conversations designed to address this gap.<n>Its unique dual-agent data collection protocol -- using both "good" and "bad" recommenders -- enables counterfactual validation.<n>We propose a comprehensive validation framework that combines statistical alignment, a human-likeness score, and counterfactual validation.
arXiv Detail & Related papers (2026-02-18T23:00:21Z) - Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing [54.456400601801704]
We introduce a multi-modal user agent for A/B testing (A/B Agent)<n>Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions.<n>We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features.
arXiv Detail & Related papers (2026-01-08T03:33:43Z) - RecoWorld: Building Simulated Environments for Agentic Recommender Systems [55.979427290369216]
We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems.<n>A user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions.<n>The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop.
arXiv Detail & Related papers (2025-09-12T16:44:34Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation [18.40619735445983]
User simulation is increasingly vital to develop and evaluate recommender systems (RSs)<n>A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs.<n>We introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data.
arXiv Detail & Related papers (2025-08-25T15:51:24Z) - PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z) - PUB: An LLM-Enhanced Personality-Driven User Behaviour Simulator for Recommender System Evaluation [9.841963696576546]
Personality-driven User Behaviour Simulator (PUB) integrates the Big Five personality traits to model personalised user behaviour.<n>PUB dynamically infers user personality from behavioural logs (e.g., ratings, reviews) and item metadata, then generates synthetic interactions that preserve statistical fidelity to real-world data.<n> Experiments on the Amazon review datasets show that logs generated by PUB closely align with real user behaviour and reveal meaningful associations between personality traits and recommendation outcomes.
arXiv Detail & Related papers (2025-06-05T01:57:36Z) - Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems [40.09105175322562]
RecInter is a novel agent-based simulation platform for recommender systems.<n>In RecInter, simulated user actions (e.g., likes, reviews, purchases) dynamically update item attributes in real-time.<n> Merchant Agents can reply, fostering a more realistic and evolving ecosystem.
arXiv Detail & Related papers (2025-05-22T09:14:23Z) - SimUSER: Simulating User Behavior with Large Language Models for Recommender System Evaluation [1.2430809884830318]
We introduce Sim, an agent framework that serves as believable and cost-effective human proxies.
Sim identifies self-consistent personas from historical data, enriching user profiles with unique backgrounds and personalities.
We conduct experiments to explore the effects of thumbnails on click rates, the exposure effect, and the impact of reviews on user engagement.
arXiv Detail & Related papers (2025-04-17T07:57:23Z) - Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance.
We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.
We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.
We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors.
Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z) - Sim2Rec: A Simulator-based Decision-making Approach to Optimize
Real-World Long-term User Engagement in Sequential Recommender Systems [43.31078296862647]
Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is suited by reinforcement learning (RL)
RL has its shortcomings, particularly requiring a large number of online samples for exploration.
We present a simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec)
arXiv Detail & Related papers (2023-05-03T19:21:25Z) - Sim-Anchored Learning for On-the-Fly Adaptation [45.123633153460034]
Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions.
We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality.
arXiv Detail & Related papers (2023-01-17T16:16:53Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation.
We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems.
We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.