Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation
- URL: http://arxiv.org/abs/2508.11086v2
- Date: Thu, 02 Oct 2025 21:50:43 GMT
- Title: Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation
- Authors: Emily Liu, Kuan Han, Minfeng Zhan, Bocheng Zhao, Guanyu Mu, Yang Song,
- Abstract summary: We propose a novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups.<n>This approach yields a quantile-based preference signal and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning.
- Score: 5.5448753341848525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Watch time is widely used as a proxy for user satisfaction in video recommendation platforms. However, raw watch times are influenced by confounding factors such as video duration, popularity, and individual user behaviors, potentially distorting preference signals and resulting in biased recommendation models. We propose a novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups. This approach yields a quantile-based preference signal and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning. Additionally, we present distributional embeddings to efficiently parameterize watch-time quantiles without requiring online sampling or storage of historical data. Both offline and online experiments demonstrate significant improvements in recommendation accuracy and robustness compared to existing baseline methods.
Related papers
- How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics [65.67654005892469]
We show that proper instance-dependent sampling can yield stronger ranking guarantees, while skewed on-policy sampling can induce excessive concentration under structured preferences.<n>We then analyze iterative alignment dynamics in which the learned policy feeds back into future sampling and reference policies.<n>Our theoretical insights extend to Direct Preference Optimization, indicating the phenomena we captured are common to a broader class of preference-alignment methods.
arXiv Detail & Related papers (2026-02-12T17:11:08Z) - Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models [65.16788152626499]
LocalDPO builds a novel framework for aligning video diffusion models with human preferences.<n>We show that LocalDPO consistently improves video fidelity, temporal coherence and human preference scores over other post-training approaches.
arXiv Detail & Related papers (2026-01-07T16:32:17Z) - Preference Trajectory Modeling via Flow Matching for Sequential Recommendation [50.077447974294586]
Sequential recommendation predicts each user's next item based on their historical interaction sequence.<n>FlowRec is a simple yet effective sequential recommendation framework.<n>We construct a personalized behavior-based prior distribution to replace Gaussian noise and learn a vector field to model user preference trajectories.
arXiv Detail & Related papers (2025-08-25T02:55:42Z) - Explicit Uncertainty Modeling for Video Watch Time Prediction [18.999640886056262]
In video recommendation, a critical component that determines the system's recommendation accuracy is the watch-time prediction module.<n>One of the key challenges of this problem is the user's watch-time behavior.<n>We propose an adversarial optimization framework that can better exploit the user watch-time behavior.
arXiv Detail & Related papers (2025-04-10T09:19:19Z) - AlignPxtr: Aligning Predicted Behavior Distributions for Bias-Free Video Recommendations [1.6187265914188775]
In video recommendation systems, user behaviors such as watch time, likes, and follows are commonly used to infer user interest.<n>We propose a novel method that aligns predicted behavior distributions across different bias conditions using quantile mapping.<n>Our approach consistently achieves significant improvements in long-term user retention and substantial gains in average app usage time.
arXiv Detail & Related papers (2025-03-10T04:59:56Z) - Modeling the Heterogeneous Duration of User Interest in Time-Dependent Recommendation: A Hidden Semi-Markov Approach [11.392605386729699]
We propose a hidden semi-Markov model to track the change of users' interests.<n>This model allows for capturing the different durations of user stays in a (latent) interest state.<n>We derive an algorithm to estimate the parameters and predict users' actions.
arXiv Detail & Related papers (2024-12-15T09:17:45Z) - Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation [2.3166433227657186]
We propose Conditional Quantile Estimation (CQE) to model the entire conditional distribution of watch time.<n>CQE characterizes the complex watch-time distribution for each user-video pair, providing a flexible and comprehensive approach to understanding user behavior.
arXiv Detail & Related papers (2024-07-17T00:25:35Z) - Counteracting Duration Bias in Video Recommendation via Counterfactual Watch Time [63.844468159126826]
Watch time prediction suffers from duration bias, hindering its ability to reflect users' interests accurately.
Counterfactual Watch Model (CWM) is proposed, revealing that CWT equals the time users get the maximum benefit from video recommender systems.
arXiv Detail & Related papers (2024-06-12T06:55:35Z) - MomentDiff: Generative Video Moment Retrieval from Random to Real [71.40038773943638]
We provide a generative diffusion-based framework called MomentDiff.
MomentDiff simulates a typical human retrieval process from random browsing to gradual localization.
We show that MomentDiff consistently outperforms state-of-the-art methods on three public benchmarks.
arXiv Detail & Related papers (2023-07-06T09:12:13Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Learning Sample Importance for Cross-Scenario Video Temporal Grounding [30.82619216537177]
The paper investigates some superficial biases specific to the temporal grounding task.
We propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases.
We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced.
arXiv Detail & Related papers (2022-01-08T15:41:38Z) - Probabilistic and Variational Recommendation Denoising [56.879165033014026]
Learning from implicit feedback is one of the most common cases in the application of recommender systems.
We propose probabilistic and variational recommendation denoising for implicit feedback.
We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets.
arXiv Detail & Related papers (2021-05-20T08:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.