Related papers: Towards End-to-End Alignment of User Satisfaction via Questionnaire in Video Recommendation

Towards End-to-End Alignment of User Satisfaction via Questionnaire in Video Recommendation

URL: http://arxiv.org/abs/2601.20215v1
Date: Wed, 28 Jan 2026 03:32:21 GMT
Title: Towards End-to-End Alignment of User Satisfaction via Questionnaire in Video Recommendation
Authors: Na Li, Jiaqi Yu, Minzhi Xie, Tiantian He, Xiaoxiao Xu, Zixiu Wang, Lantao Hu, Yongqi Liu, Han Li, Kaiqiao Zhan, Kun Gai,
Abstract summary: Short-video recommender systems typically optimize ranking models using dense user behavioral signals, such as clicks and watch time.<n>Recently, explicit satisfaction feedback collected through questionnaires has emerged as a high-quality direct alignment supervision.<n>We propose a novel framework towards End-to-End Alignment of user Satisfaction via Questionaire, named EASQ, to enable real-time alignment of ranking models with true user satisfaction.
Score: 24.788289121071575
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Short-video recommender systems typically optimize ranking models using dense user behavioral signals, such as clicks and watch time. However, these signals are only indirect proxies of user satisfaction and often suffer from noise and bias. Recently, explicit satisfaction feedback collected through questionnaires has emerged as a high-quality direct alignment supervision, but is extremely sparse and easily overwhelmed by abundant behavioral data, making it difficult to incorporate into online recommendation models. To address these challenges, we propose a novel framework which is towards End-to-End Alignment of user Satisfaction via Questionaire, named EASQ, to enable real-time alignment of ranking models with true user satisfaction. Specifically, we first construct an independent parameter pathway for sparse questionnaire signals by combining a multi-task architecture and a lightweight LoRA module. The multi-task design separates sparse satisfaction supervision from dense behavioral signals, preventing the former from being overwhelmed. The LoRA module pre-inject these preferences in a parameter-isolated manner, ensuring stability in the backbone while optimizing user satisfaction. Furthermore, we employ a DPO-based optimization objective tailored for online learning, which aligns the main model outputs with sparse satisfaction signals in real time. This design enables end-to-end online learning, allowing the model to continuously adapt to new questionnaire feedback while maintaining the stability and effectiveness of the backbone. Extensive offline experiments and large-scale online A/B tests demonstrate that EASQ consistently improves user satisfaction metrics across multiple scenarios. EASQ has been successfully deployed in a production short-video recommendation system, delivering significant and stable business gains.

Related papers

Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z)
Retentive Relevance: Capturing Long-Term User Value in Recommendation Systems [29.596401271139797]
We introduce Retentive Relevance, a novel content-level survey-based feedback measure.<n>Retentive Relevance directly assesses users' intent to return to the platform for similar content.<n>We show that Retentive Relevance significantly outperforms both engagement signals and other survey measures in predicting next-day retention.
arXiv Detail & Related papers (2025-10-08T23:38:57Z)
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z)
From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System [11.373145953200137]
We introduce a multi-stage framework designed for progressive alignment between the generation policy and user intent.<n>Our framework significantly outperforms baselines on both automatic and human evaluations.
arXiv Detail & Related papers (2025-08-15T10:17:01Z)
Modeling User Behavior from Adaptive Surveys with Supplemental Context [1.433758865948252]
We present LANTERN, a modular architecture for modeling user behavior by fusing adaptive survey responses with contextual signals.<n>We demonstrate the architectural value of maintaining survey primacy through selective gating, residual connections and late fusion.<n>We further investigate threshold sensitivity and the benefits of selective modality reliance through ablation and rare/frequent attribute analysis.
arXiv Detail & Related papers (2025-07-28T15:19:54Z)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [65.18157595903124]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
Churn-Aware Recommendation Planning under Aggregated Preference Feedback [6.261444979025644]
We study a sequential decision-making problem motivated by recent regulatory and technological shifts.<n>We introduce the Rec-APC model, in which an anonymous user is drawn from a known prior over latent user types.<n>We prove that optimal policies converge to pure exploitation in finite time and propose a branch-and-bound algorithm to efficiently compute them.
arXiv Detail & Related papers (2025-07-06T19:22:47Z)
Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z)
Modeling User Retention through Generative Flow Networks [34.74982897470852]
Flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session. We show that the flow combined with traditional learning-to-rank objectives eventually optimized a non-discounted cumulative reward for both immediate user feedback and user retention.
arXiv Detail & Related papers (2024-06-10T06:22:18Z)
Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z)
PURS: Personalized Unexpected Recommender System for Improving User Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process. Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z)
Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.