Related papers: Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

URL: http://arxiv.org/abs/2407.12223v1
Date: Wed, 17 Jul 2024 00:25:35 GMT
Title: Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation
Authors: Chengzhi Lin, Shuchang Liu, Chuyuan Wang, Yongqi Liu,
Abstract summary: We introduce a novel estimation technique -- Conditional Quantile Estimation (CQE) CQE utilizes quantile regression to capture the nuanced distribution of watch time. We also design several strategies to enhance the quantile prediction including conditional expectation, conservative estimation, and dynamic quantile combination.
Score: 2.3166433227657186
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Within the domain of short video recommendation, predicting users' watch time is a critical but challenging task. Prevailing deterministic solutions obtain accurate debiased statistical models, yet they neglect the intrinsic uncertainty inherent in user environments. In our observation, we found that this uncertainty could potentially limit these methods' accuracy in watch-time prediction on our online platform, despite that we have employed numerous features and complex network architectures. Consequently, we believe that a better solution is to model the conditional distribution of this uncertain watch time. In this paper, we introduce a novel estimation technique -- Conditional Quantile Estimation (CQE), which utilizes quantile regression to capture the nuanced distribution of watch time. The learned distribution accounts for the stochastic nature of users, thereby it provides a more accurate and robust estimation. In addition, we also design several strategies to enhance the quantile prediction including conditional expectation, conservative estimation, and dynamic quantile combination. We verify the effectiveness of our method through extensive offline evaluations using public datasets as well as deployment in a real-world video application with over 300 million daily active users.

Related papers

ProactiveVideoQA: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language Models [41.35497807436858]
We introduce ProactiveVideoQA, the first comprehensive benchmark to evaluate a system's ability to engage in proactive interaction.<n>We also propose PAUC, the first metric that accounts for the temporal dynamics of model responses.<n>These findings demonstrate that PAUC provides a more faithful assessment of user experience in proactive interaction scenarios.
arXiv Detail & Related papers (2025-07-12T15:11:50Z)
HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding [79.06209664703258]
Multimodal Large Language Models (MLLMs) have demonstrated significant advances in visual understanding tasks involving both images and videos.<n>Existing human-centric benchmarks predominantly emphasize video generation quality and action recognition, while overlooking essential perceptual and cognitive abilities required in human-centered scenarios.<n>We propose a rigorously curated benchmark designed to provide a more holistic evaluation of MLLMs in human-centric video understanding.
arXiv Detail & Related papers (2025-07-07T11:52:24Z)
Explicit Uncertainty Modeling for Video Watch Time Prediction [18.999640886056262]
In video recommendation, a critical component that determines the system's recommendation accuracy is the watch-time prediction module. One of the key challenges of this problem is the user's watch-time behavior. We propose an adversarial optimization framework that can better exploit the user watch-time behavior.
arXiv Detail & Related papers (2025-04-10T09:19:19Z)
SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis [15.246875830547056]
We propose a white-box statistical framework that translates various user behavior assumptions in watching (short) videos into statistical watch time models. We test our models extensively on two public datasets, a large-scale offline industrial dataset, and an online A/B test on a short video platform with hundreds of millions of daily-active users.
arXiv Detail & Related papers (2024-08-14T18:19:35Z)
Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation [23.417370317522106]
We introduce a variant of video object segmentation (VOS) that bridges interactive and semi-automatic approaches. We aim to maximize the tracking duration of an object of interest, while requiring minimal user corrections to maintain tracking over an extended period. We evaluate our approach using the recently introduced LVOS dataset, which offers numerous long-term videos.
arXiv Detail & Related papers (2024-07-31T21:42:42Z)
Counteracting Duration Bias in Video Recommendation via Counterfactual Watch Time [63.844468159126826]
Watch time prediction suffers from duration bias, hindering its ability to reflect users' interests accurately. Counterfactual Watch Model (CWM) is proposed, revealing that CWT equals the time users get the maximum benefit from video recommender systems.
arXiv Detail & Related papers (2024-06-12T06:55:35Z)
Enhancing Sequential Recommender with Large Language Models for Joint Video and Comment Recommendation [77.42486522565295]
We propose a novel recommendation approach called LSVCR to jointly perform personalized video and comment recommendation.<n>Our approach comprises two key components: sequential recommendation (SR) model and supplemental large language model (LLM) recommender.<n>In particular, we attain a cumulative gain of 4.13% in comment watch time.
arXiv Detail & Related papers (2024-03-20T13:14:29Z)
Conformal Prediction in Multi-User Settings: An Evaluation [0.10231119246773925]
Machine learning models are trained and evaluated without making any distinction between users. This produces inaccurate performance metrics estimates in multi-user settings. In this work we evaluated the conformal prediction framework in several multi-user settings.
arXiv Detail & Related papers (2023-12-08T17:33:23Z)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation. Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z)
Multiscale Video Pretraining for Long-Term Activity Forecasting [67.06864386274736]
Multiscale Video Pretraining learns robust representations for forecasting by learning to predict contextualized representations of future video clips over multiple timescales. MVP is based on our observation that actions in videos have a multiscale nature, where atomic actions typically occur at a short timescale and more complex actions may span longer timescales. Our comprehensive experiments across the Ego4D and Epic-Kitchens-55/100 datasets demonstrate that MVP out-performs state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2023-07-24T14:55:15Z)
Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM) The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z)
Perception Test: A Diagnostic Benchmark for Multimodal Video Models [78.64546291816117]
We propose a novel multimodal video benchmark to evaluate the perception and reasoning skills of pre-trained multimodal models. The Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime.
arXiv Detail & Related papers (2023-05-23T07:54:37Z)
Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data. Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step. When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z)
Click-through Rate Prediction with Auto-Quantized Contrastive Learning [46.585376453464114]
We consider whether the user behaviors are rich enough to capture the interests for prediction, and propose an Auto-Quantized Contrastive Learning (AQCL) loss to regularize the model. The proposed framework is agnostic to different model architectures and can be trained in an end-to-end fashion.
arXiv Detail & Related papers (2021-09-27T04:39:43Z)
Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units. We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences. Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.