Related papers: MTRec: Learning to Align with User Preferences via Mental Reward Models

MTRec: Learning to Align with User Preferences via Mental Reward Models

URL: http://arxiv.org/abs/2509.22807v2
Date: Fri, 03 Oct 2025 12:22:04 GMT
Title: MTRec: Learning to Align with User Preferences via Mental Reward Models
Authors: Mengchen Zhao, Yifan Gao, Yaqing Hou, Xiangyang Li, Pengjie Gu, Zhenhua Dong, Ruiming Tang, Yi Cai,
Abstract summary: We propose MTRec, a sequential recommendation framework designed to align with real user preferences.<n>We introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it.<n>Experiments show that MTRec brings significant improvements to a variety of recommendation models.
Score: 60.321038000806176
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recommendation models are predominantly trained using implicit user feedback, since explicit feedback is often costly to obtain. However, implicit feedback, such as clicks, does not always reflect users' real preferences. For example, a user might click on a news article because of its attractive headline, but end up feeling uncomfortable after reading the content. In the absence of explicit feedback, such erroneous implicit signals may severely mislead recommender systems. In this paper, we propose MTRec, a novel sequential recommendation framework designed to align with real user preferences by uncovering their internal satisfaction on recommended items. Specifically, we introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it. The learned mental reward model is then used to guide recommendation models to better align with users' real preferences. Our experiments show that MTRec brings significant improvements to a variety of recommendation models. We also deploy MTRec on an industrial short video platform and observe a 7 percent increase in average user viewing time.

Related papers

Enhancing Sequential Recommender with Large Language Models for Joint Video and Comment Recommendation [77.42486522565295]
We propose a novel recommendation approach called LSVCR to jointly perform personalized video and comment recommendation.<n>Our approach comprises two key components: sequential recommendation (SR) model and supplemental large language model (LLM) recommender.<n>In particular, we attain a cumulative gain of 4.13% in comment watch time.
arXiv Detail & Related papers (2024-03-20T13:14:29Z)
Interactive Garment Recommendation with User in the Loop [77.35411131350833]
We propose to build a user profile on the fly by integrating user reactions as we recommend complementary items to compose an outfit. We present a reinforcement learning agent capable of suggesting appropriate garments and ingesting user feedback to improve its recommendations.
arXiv Detail & Related papers (2024-02-18T16:01:28Z)
Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders [13.762960304406016]
We introduce explicit and implicit negative user feedback into the training objective of sequential recommenders. We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system.
arXiv Detail & Related papers (2023-08-23T17:16:07Z)
Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z)
Recommendation with User Active Disclosing Willingness [20.306413327597603]
We study a novel recommendation paradigm, where the users are allowed to indicate their "willingness" on disclosing different behaviors. We conduct extensive experiments to demonstrate the effectiveness of our model on balancing the recommendation quality and user disclosing willingness.
arXiv Detail & Related papers (2022-10-25T04:43:40Z)
Correcting the User Feedback-Loop Bias for Recommendation Systems [34.44834423714441]
We propose a systematic and dynamic way to correct user feedback-loop bias in recommendation systems. Our method includes a deep-learning component to learn each user's dynamic rating history embedding. We empirically validated the existence of such user feedback-loop bias in real world recommendation systems.
arXiv Detail & Related papers (2021-09-13T15:02:55Z)
PURS: Personalized Unexpected Recommender System for Improving User Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process. Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z)
ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models [26.11434743591804]
We devise a human-in-the-loop framework, called ELIXIR, where user feedback on explanations is leveraged for pairwise learning of user preferences. ELIXIR leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors. Our framework is instantiated using generalized graph recommendation via Random Walk with Restart.
arXiv Detail & Related papers (2021-02-15T13:43:49Z)
Reward Constrained Interactive Recommendation with Natural Language Feedback [158.8095688415973]
We propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference. Our proposed framework is general and is further extended to the task of constrained text generation.
arXiv Detail & Related papers (2020-05-04T16:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.