Related papers: Can we predict the Most Replayed data of video streaming platforms?

Can we predict the Most Replayed data of video streaming platforms?

URL: http://arxiv.org/abs/2309.06102v1
Date: Tue, 12 Sep 2023 10:08:33 GMT
Title: Can we predict the Most Replayed data of video streaming platforms?
Authors: Alessandro Duico, Ombretta Strafforello, Jan van Gemert
Abstract summary: We explore whether it is possible to predict the Most Replayed (MR) data from YouTube videos. To this end, we curate a large video benchmark, the YTMR500 dataset, which comprises 500 YouTube videos with MR data annotations. We evaluate Deep Learning (DL) models of varying complexity on our dataset and perform an extensive ablation study.
Score: 57.55927378696826
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Predicting which specific parts of a video users will replay is important for several applications, including targeted advertisement placement on video platforms and assisting video creators. In this work, we explore whether it is possible to predict the Most Replayed (MR) data from YouTube videos. To this end, we curate a large video benchmark, the YTMR500 dataset, which comprises 500 YouTube videos with MR data annotations. We evaluate Deep Learning (DL) models of varying complexity on our dataset and perform an extensive ablation study. In addition, we conduct a user study to estimate the human performance on MR data prediction. Our results show that, although by a narrow margin, all the evaluated DL models outperform random predictions. Additionally, they exceed human-level accuracy. This suggests that predicting the MR data is a difficult task that can be enhanced through the assistance of DL. Finally, we believe that DL performance on MR data prediction can be further improved, for example, by using multi-modal learning. We encourage the research community to use our benchmark dataset to further investigate automatic MR data prediction.

Related papers

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding [126.15907330726067]
We build a Perception Model Language (PLM) in a fully open and reproducible framework for transparent research in image and video understanding. We analyze standard training pipelines without distillation from models and explore large-scale synthetic data to identify critical data gaps.
arXiv Detail & Related papers (2025-04-17T17:59:56Z)
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation [98.92677830223786]
This work revisits scaling with synthetic data and focuses on developing video-LLMs from a data-centric perspective. We propose a data augmentation method called Sparrow, which synthesizes video-like samples from pure text instruction data. Our proposed method achieves performance comparable to or even superior to baselines trained with many more samples.
arXiv Detail & Related papers (2024-11-29T18:59:54Z)
CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding. Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects. We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z)
Prompt Public Large Language Models to Synthesize Data for Private On-device Applications [5.713077600587505]
This paper investigates how large language models (LLMs) trained on public data can improve the quality of pre-training data for the on-device language models trained with DP and FL. The model pre-trained on our synthetic dataset achieves relative improvement of 19.0% and 22.8% in next word prediction accuracy. Our experiments demonstrate the strengths of LLMs in synthesizing data close to the private distribution even without accessing the private data.
arXiv Detail & Related papers (2024-04-05T19:14:14Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
MRCLens: an MRC Dataset Bias Detection Toolkit [82.44296974850639]
We introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.
arXiv Detail & Related papers (2022-07-18T21:05:39Z)
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels [33.659146748289444]
We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets.
arXiv Detail & Related papers (2021-10-13T16:12:18Z)
Self-Supervised Representation Learning for Detection of ACL Tear Injury in Knee MR Videos [18.54362818156725]
We propose a self-supervised learning approach to learn transferable features from MR video clips by enforcing the model to learn anatomical features. To the best of our knowledge, none of the supervised learning models performing injury classification task from MR video provide any explanation for the decisions made by the models.
arXiv Detail & Related papers (2020-07-15T15:35:47Z)
Constructing a Highlight Classifier with an Attention Based LSTM Neural Network [0.0]
Market researchers manually review the vast majority of consumer research video in order to identify relevant portions - highlights. In this study we present a novel approach for NLP-based highlight identification and extraction based on a supervised learning model.
arXiv Detail & Related papers (2020-02-12T15:18:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.