Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation
- URL: http://arxiv.org/abs/2508.18142v1
- Date: Mon, 25 Aug 2025 15:51:24 GMT
- Title: Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation
- Authors: Tianjun Wei, Huizhong Guo, Yingpeng Du, Zhu Sun, Chen Huang, Dongxia Wang, Jie Zhang,
- Abstract summary: User simulation is increasingly vital to develop and evaluate recommender systems (RSs)<n>A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs.<n>We introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data.
- Score: 18.40619735445983
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: User simulation is increasingly vital to develop and evaluate recommender systems (RSs). While Large Language Models (LLMs) offer promising avenues to simulate user behavior, they often struggle with the absence of specific domain alignment required for RSs and the efficiency demands of large-scale simulation. A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs. However, directly leveraging such feedback presents two significant challenges. First, user feedback in RSs is often ambiguous and noisy, which negatively impacts effective preference alignment. Second, the massive volume of feedback largely hinders the efficiency of preference alignment, necessitating an efficient filtering mechanism to identify more informative samples. To overcome these hurdles, we introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data. Our framework unfolds in two key phases: (1) employing LLMs to generate cognitive decision-making processes on constructed simulation samples, reducing ambiguity in raw user feedback; (2) data distillation based on uncertainty estimation and behavior sampling to filter challenging yet denoised simulation samples. Accordingly, we fine-tune lightweight LLMs, as user simulators, using such high-quality dataset with corresponding decision-making processes. Extensive experiments verify that our framework significantly boosts the alignment with human preferences and in-domain reasoning capabilities of fine-tuned LLMs, and provides more insightful and interpretable signals when interacting with RSs. We believe our work will advance the RS community and offer valuable insights for broader human-centric AI research.
Related papers
- ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders [48.83868690303791]
We introduce ConvApparel, a new dataset of human-AI conversations designed to address this gap.<n>Its unique dual-agent data collection protocol -- using both "good" and "bad" recommenders -- enables counterfactual validation.<n>We propose a comprehensive validation framework that combines statistical alignment, a human-likeness score, and counterfactual validation.
arXiv Detail & Related papers (2026-02-18T23:00:21Z) - Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation [15.61963892566877]
DGDPO is a novel framework that constructs user profile through a dynamic and iterative optimization process.<n>Unlike existing LLM-based user simulators that are limited to single-round interactions, we are the first to integrate DGDPO with sequential recommenders.
arXiv Detail & Related papers (2025-08-18T06:17:59Z) - Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z) - Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.<n>We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - LLM-Powered User Simulator for Recommender System [29.328839982869923]
We introduce an LLM-powered user simulator to simulate user engagement with items in an explicit manner.<n>Specifically, we identify the explicit logic of user preferences, leverage LLMs to analyze item characteristics and distill user sentiments.<n>We propose an ensemble model that synergizes logical and statistical insights for user interaction simulations.
arXiv Detail & Related papers (2024-12-22T12:00:04Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation [57.49045064294086]
Large Language Model (LLM) has the ability to capture semantic relationships between items, independent of their popularity.<n>We introduce LLMEmb, a novel method leveraging LLM to generate item embeddings that enhance Sequential Recommender Systems (SRS) performance.
arXiv Detail & Related papers (2024-09-30T03:59:06Z) - Efficient and Responsible Adaptation of Large Language Models for Robust Top-k Recommendations [11.004673022505566]
Long user queries from millions of users can degrade the performance of large language models for recommendation.
We propose a hybrid task allocation framework that utilizes the capabilities of both large language models and traditional recommendation systems.
Our results on three real-world datasets show a significant reduction in weak users and improved robustness of RSs to sub-populations.
arXiv Detail & Related papers (2024-05-01T19:11:47Z) - How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation [14.646529557978512]
We analyze the limitations of using Large Language Models in constructing user simulators for Conversational Recommender System.
Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results.
We propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items.
arXiv Detail & Related papers (2024-03-25T04:21:06Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z) - User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors.
Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z) - Sim2Rec: A Simulator-based Decision-making Approach to Optimize
Real-World Long-term User Engagement in Sequential Recommender Systems [43.31078296862647]
Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is suited by reinforcement learning (RL)
RL has its shortcomings, particularly requiring a large number of online samples for exploration.
We present a simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec)
arXiv Detail & Related papers (2023-05-03T19:21:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.