SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis
- URL: http://arxiv.org/abs/2408.07759v1
- Date: Wed, 14 Aug 2024 18:19:35 GMT
- Title: SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis
- Authors: Shentao Yang, Haichuan Yang, Linna Du, Adithya Ganesh, Bo Peng, Boying Liu, Serena Li, Ji Liu,
- Abstract summary: We propose a white-box statistical framework that translates various user behavior assumptions in watching (short) videos into statistical watch time models.
We test our models extensively on two public datasets, a large-scale offline industrial dataset, and an online A/B test on a short video platform with hundreds of millions of daily-active users.
- Score: 15.246875830547056
- License:
- Abstract: The significance of estimating video watch time has been highlighted by the rising importance of (short) video recommendation, which has become a core product of mainstream social media platforms. Modeling video watch time, however, has been challenged by the complexity of user-video interaction, such as different user behavior modes in watching the recommended videos and varying watching probabilities over the video horizon. Despite the importance and challenges, existing literature on modeling video watch time mostly focuses on relatively black-box mechanical enhancement of the classical regression/classification losses, without factoring in user behavior in a principled manner. In this paper, we for the first time take on a user-centric perspective to model video watch time, from which we propose a white-box statistical framework that directly translates various user behavior assumptions in watching (short) videos into statistical watch time models. These behavior assumptions are portrayed by our domain knowledge on users' behavior modes in video watching. We further employ bucketization to cope with user's non-stationary watching probability over the video horizon, which additionally helps to respect the constraint of video length and facilitate the practical compatibility between the continuous regression event of watch time and other binary classification events. We test our models extensively on two public datasets, a large-scale offline industrial dataset, and an online A/B test on a short video platform with hundreds of millions of daily-active users. On all experiments, our models perform competitively against strong relevant baselines, demonstrating the efficacy of our user-centric perspective and proposed framework.
Related papers
- Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation [2.3166433227657186]
We propose the Conditional Quantile Estimation (CQE) framework to model the entire conditional distribution of watch time.
CQE characterizes the complex watch-time distribution for each user-video pair, providing a flexible and comprehensive approach to understanding user behavior.
In particular, the online deployment of CQE in KuaiShow has led to significant improvements in key evaluation metrics.
arXiv Detail & Related papers (2024-07-17T00:25:35Z) - VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - Counteracting Duration Bias in Video Recommendation via Counterfactual Watch Time [63.844468159126826]
Watch time prediction suffers from duration bias, hindering its ability to reflect users' interests accurately.
Counterfactual Watch Model (CWM) is proposed, revealing that CWT equals the time users get the maximum benefit from video recommender systems.
arXiv Detail & Related papers (2024-06-12T06:55:35Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection [14.089888316857426]
This paper focuses on weakly supervised video anomaly detection.
We develop a lightweight video anomaly detection model.
We show that our model can achieve comparable or even superior AUC score compared to the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-09T01:23:08Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - Self-Supervised Video Representation Learning via Latent Time Navigation [12.721647696921865]
Self-supervised video representation learning aims at maximizing similarity between different temporal segments of one video.
We propose Latent Time Navigation (LTN) to capture fine-grained motions.
Our experimental analysis suggests that learning video representations by LTN consistently improves performance of action classification.
arXiv Detail & Related papers (2023-05-10T20:06:17Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - A Deep Graph Reinforcement Learning Model for Improving User Experience
in Live Video Streaming [7.852895577861326]
We present a deep graph reinforcement learning model to predict and improve the user experience during a live video streaming event.
Our model can significantly increase the number of viewers with high quality experience by at least 75% over the first streaming minutes.
arXiv Detail & Related papers (2021-07-28T19:53:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.