Related papers: Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

URL: http://arxiv.org/abs/2407.12223v5
Date: Sun, 18 May 2025 04:39:39 GMT
Title: Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation
Authors: Chengzhi Lin, Shuchang Liu, Chuyuan Wang, Yongqi Liu,
Abstract summary: We propose Conditional Quantile Estimation (CQE) to model the entire conditional distribution of watch time.<n>CQE characterizes the complex watch-time distribution for each user-video pair, providing a flexible and comprehensive approach to understanding user behavior.
Score: 2.3166433227657186
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Accurately predicting watch time is crucial for optimizing recommendations and user experience in short video platforms. However, existing methods that estimate a single average watch time often fail to capture the inherent uncertainty in user engagement patterns. In this paper, we propose Conditional Quantile Estimation (CQE) to model the entire conditional distribution of watch time. Using quantile regression, CQE characterizes the complex watch-time distribution for each user-video pair, providing a flexible and comprehensive approach to understanding user behavior. We further design multiple strategies to combine the quantile estimates, adapting to different recommendation scenarios and user preferences. Extensive offline experiments and online A/B tests demonstrate the superiority of CQE in watch-time prediction and user engagement modeling. Specifically, deploying CQE online on a large-scale platform with hundreds of millions of daily active users has led to substantial gains in key evaluation metrics, including active days, engagement time, and video views. These results highlight the practical impact of our proposed approach in enhancing the user experience and overall performance of the short video recommendation system. The code will be released https://github.com/justopit/CQE.

Related papers

Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation [5.5448753341848525]
We propose a novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups.<n>This approach yields a quantile-based preference signal and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning.
arXiv Detail & Related papers (2025-08-14T21:52:00Z)
ProactiveVideoQA: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language Models [41.35497807436858]
We introduce ProactiveVideoQA, the first comprehensive benchmark to evaluate a system's ability to engage in proactive interaction.<n>We also propose PAUC, the first metric that accounts for the temporal dynamics of model responses.<n>These findings demonstrate that PAUC provides a more faithful assessment of user experience in proactive interaction scenarios.
arXiv Detail & Related papers (2025-07-12T15:11:50Z)
HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding [79.06209664703258]
Multimodal Large Language Models (MLLMs) have demonstrated significant advances in visual understanding tasks involving both images and videos.<n>Existing human-centric benchmarks predominantly emphasize video generation quality and action recognition, while overlooking essential perceptual and cognitive abilities required in human-centered scenarios.<n>We propose a rigorously curated benchmark designed to provide a more holistic evaluation of MLLMs in human-centric video understanding.
arXiv Detail & Related papers (2025-07-07T11:52:24Z)
Explicit Uncertainty Modeling for Video Watch Time Prediction [18.999640886056262]
In video recommendation, a critical component that determines the system's recommendation accuracy is the watch-time prediction module. One of the key challenges of this problem is the user's watch-time behavior. We propose an adversarial optimization framework that can better exploit the user watch-time behavior.
arXiv Detail & Related papers (2025-04-10T09:19:19Z)
Generate the browsing process for short-video recommendation [6.246989522091273]
This paper proposes a generative method to dynamically simulate users' short video watching journey for watch time prediction in short video recommendation.<n>Our method simulates users' sustained interest in watching short videos by learning collaborative information.<n>Experiments on industrial-scale and public datasets demonstrate that our method achieves state-of-the-art performance on watch time prediction tasks.
arXiv Detail & Related papers (2025-04-02T20:54:52Z)
SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis [15.246875830547056]
We propose a white-box statistical framework that translates various user behavior assumptions in watching (short) videos into statistical watch time models. We test our models extensively on two public datasets, a large-scale offline industrial dataset, and an online A/B test on a short video platform with hundreds of millions of daily-active users.
arXiv Detail & Related papers (2024-08-14T18:19:35Z)
Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation [23.417370317522106]
We introduce a variant of video object segmentation (VOS) that bridges interactive and semi-automatic approaches. We aim to maximize the tracking duration of an object of interest, while requiring minimal user corrections to maintain tracking over an extended period. We evaluate our approach using the recently introduced LVOS dataset, which offers numerous long-term videos.
arXiv Detail & Related papers (2024-07-31T21:42:42Z)
Counteracting Duration Bias in Video Recommendation via Counterfactual Watch Time [63.844468159126826]
Watch time prediction suffers from duration bias, hindering its ability to reflect users' interests accurately. Counterfactual Watch Model (CWM) is proposed, revealing that CWT equals the time users get the maximum benefit from video recommender systems.
arXiv Detail & Related papers (2024-06-12T06:55:35Z)
Enhancing Sequential Recommender with Large Language Models for Joint Video and Comment Recommendation [77.42486522565295]
We propose a novel recommendation approach called LSVCR to jointly perform personalized video and comment recommendation.<n>Our approach comprises two key components: sequential recommendation (SR) model and supplemental large language model (LLM) recommender.<n>In particular, we attain a cumulative gain of 4.13% in comment watch time.
arXiv Detail & Related papers (2024-03-20T13:14:29Z)
Conformal Prediction in Multi-User Settings: An Evaluation [0.10231119246773925]
Machine learning models are trained and evaluated without making any distinction between users. This produces inaccurate performance metrics estimates in multi-user settings. In this work we evaluated the conformal prediction framework in several multi-user settings.
arXiv Detail & Related papers (2023-12-08T17:33:23Z)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation. Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z)
Multiscale Video Pretraining for Long-Term Activity Forecasting [67.06864386274736]
Multiscale Video Pretraining learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales. MVP is based on our observation that actions in videos have a multiscale nature, where atomic actions typically occur at a short timescale and more complex actions may span longer timescales. Our comprehensive experiments across the Ego4D and Epic-Kitchens-55/100 datasets demonstrate that MVP out-performs state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2023-07-24T14:55:15Z)
Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM) The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z)
Perception Test: A Diagnostic Benchmark for Multimodal Video Models [78.64546291816117]
We propose a novel multimodal video benchmark to evaluate the perception and reasoning skills of pre-trained multimodal models. The Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime.
arXiv Detail & Related papers (2023-05-23T07:54:37Z)
Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data. Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step. When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z)
Click-through Rate Prediction with Auto-Quantized Contrastive Learning [46.585376453464114]
We consider whether the user behaviors are rich enough to capture the interests for prediction, and propose an Auto-Quantized Contrastive Learning (AQCL) loss to regularize the model. The proposed framework is agnostic to different model architectures and can be trained in an end-to-end fashion.
arXiv Detail & Related papers (2021-09-27T04:39:43Z)
Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units. We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences. Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.