G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation
- URL: http://arxiv.org/abs/2508.05709v1
- Date: Thu, 07 Aug 2025 07:26:08 GMT
- Title: G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation
- Authors: Boyu Chen, Siran Chen, Zhengrong Yue, Kainan Yan, Chenyun Yu, Beibei Kong, Cheng Lei, Chengxiang Zhuo, Zang Li, Yali Wang,
- Abstract summary: Inferring user preferences from massive implicit feedback has shown great potential.<n>We propose a novel Group-aware User Behavior Simulation (G-UBS) paradigm.<n>G-UBS operates via two key agents. First, the User Group Manager (UGM) effectively clusters users to generate group profiles.<n>Second, the User Feedback Modeler (UFM) employs an innovative group-aware reinforcement learning approach.
- Score: 15.424496368749738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: User feedback is critical for refining recommendation systems, yet explicit feedback (e.g., likes or dislikes) remains scarce in practice. As a more feasible alternative, inferring user preferences from massive implicit feedback has shown great potential (e.g., a user quickly skipping a recommended video usually indicates disinterest). Unfortunately, implicit feedback is often noisy: a user might skip a video due to accidental clicks or other reasons, rather than disliking it. Such noise can easily misjudge user interests, thereby undermining recommendation performance. To address this issue, we propose a novel Group-aware User Behavior Simulation (G-UBS) paradigm, which leverages contextual guidance from relevant user groups, enabling robust and in-depth interpretation of implicit feedback for individual users. Specifically, G-UBS operates via two key agents. First, the User Group Manager (UGM) effectively clusters users to generate group profiles utilizing a ``summarize-cluster-reflect" workflow based on LLMs. Second, the User Feedback Modeler (UFM) employs an innovative group-aware reinforcement learning approach, where each user is guided by the associated group profiles during the reinforcement learning process, allowing UFM to robustly and deeply examine the reasons behind implicit feedback. To assess our G-UBS paradigm, we have constructed a Video Recommendation benchmark with Implicit Feedback (IF-VR). To the best of our knowledge, this is the first multi-modal benchmark for implicit feedback evaluation in video recommendation, encompassing 15k users, 25k videos, and 933k interaction records with implicit feedback. Extensive experiments on IF-VR demonstrate that G-UBS significantly outperforms mainstream LLMs and MLLMs, with a 4.0% higher proportion of videos achieving a play rate > 30% and 14.9% higher reasoning accuracy on IF-VR.
Related papers
- MTRec: Learning to Align with User Preferences via Mental Reward Models [60.321038000806176]
We propose MTRec, a sequential recommendation framework designed to align with real user preferences.<n>We introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it.<n>Experiments show that MTRec brings significant improvements to a variety of recommendation models.
arXiv Detail & Related papers (2025-09-26T18:10:48Z) - User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal [58.43749783815486]
We study implicit user feedback in two user-LM interaction datasets.<n>We find that the contents of user feedback can improve model performance in short human-designed questions.<n>We also find that the usefulness of user feedback is largely tied to the quality of the user's initial prompt.
arXiv Detail & Related papers (2025-07-30T23:33:29Z) - Reinforcement Learning from User Feedback [28.335218244885706]
We introduce Reinforcement Learning from User Feedback (RLUF), a framework for aligning large language models with user preferences.<n>We train a reward model, P[Love], to predict the likelihood that an LLM response will receive a Love Reaction.<n>We show that P[Love] is predictive of increased positive feedback and serves as a reliable offline evaluator of future user behavior.
arXiv Detail & Related papers (2025-05-20T22:14:44Z) - Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z) - Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.<n>We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback [36.06000681394939]
We introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with large language models (LLMs) to create preference datasets automatically.<n>Our experiments demonstrate that LLMs fine-tuned on WildFeedback dataset exhibit significantly improved alignment with user preferences.
arXiv Detail & Related papers (2024-08-28T05:53:46Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Enhancing Sequential Recommender with Large Language Models for Joint Video and Comment Recommendation [77.42486522565295]
We propose a novel recommendation approach called LSVCR to jointly perform personalized video and comment recommendation.<n>Our approach comprises two key components: sequential recommendation (SR) model and supplemental large language model (LLM) recommender.<n>In particular, we attain a cumulative gain of 4.13% in comment watch time.
arXiv Detail & Related papers (2024-03-20T13:14:29Z) - Learning from Negative User Feedback and Measuring Responsiveness for
Sequential Recommenders [13.762960304406016]
We introduce explicit and implicit negative user feedback into the training objective of sequential recommenders.
We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system.
arXiv Detail & Related papers (2023-08-23T17:16:07Z) - ELIXIR: Learning from User Feedback on Explanations to Improve
Recommender Models [26.11434743591804]
We devise a human-in-the-loop framework, called ELIXIR, where user feedback on explanations is leveraged for pairwise learning of user preferences.
ELIXIR leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors.
Our framework is instantiated using generalized graph recommendation via Random Walk with Restart.
arXiv Detail & Related papers (2021-02-15T13:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.