Multi-Task Learning for User Engagement and Adoption in Live Video
Streaming Events
- URL: http://arxiv.org/abs/2106.10305v1
- Date: Fri, 18 Jun 2021 18:30:22 GMT
- Title: Multi-Task Learning for User Engagement and Adoption in Live Video
Streaming Events
- Authors: Stefanos Antaris and Dimitrios Rafailidis and Romina Arriaza
- Abstract summary: We present a multi-task deep reinforcement learning model to select the time of a live video streaming event.
We consider the engagement and adoption of the viewers as independent tasks and formulate a unified loss function to learn a common policy.
Our experiments demonstrate the effectiveness of the proposed model when compared with several state-of-the-art strategies.
- Score: 7.5413579967970605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, live video streaming events have become a mainstay in viewer's
communication in large international enterprises. Provided that viewers are
distributed worldwide, the main challenge resides on how to schedule the
optimal event's time so as to improve both the viewer's engagement and
adoption. In this paper we present a multi-task deep reinforcement learning
model to select the time of a live video streaming event, aiming to optimize
the viewer's engagement and adoption at the same time. We consider the
engagement and adoption of the viewers as independent tasks and formulate a
unified loss function to learn a common policy. In addition, we account for the
fact that each task might have different contribution to the training strategy
of the agent. Therefore, to determine the contribution of each task to the
agent's training, we design a Transformer's architecture for the state-action
transitions of each task. We evaluate our proposed model on four real-world
datasets, generated by the live video streaming events of four large
enterprises spanning from January 2019 until March 2021. Our experiments
demonstrate the effectiveness of the proposed model when compared with several
state-of-the-art strategies. For reproduction purposes, our evaluation datasets
and implementation are publicly available at
https://github.com/stefanosantaris/merlin.
Related papers
- The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024 [27.30100635072298]
TAL focuses on identifying and classifying actions within specific time intervals throughout a video sequence.
We employ a data augmentation technique by expanding the training dataset using overlapping labels from the Something-SomethingV2 dataset.
For feature extraction, we utilize state-of-the-art models, including UMT, VideoMAEv2 for video features, and BEATs and CAV-MAE for audio features.
arXiv Detail & Related papers (2024-10-08T01:07:21Z) - Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - Towards Event-oriented Long Video Understanding [101.48089908037888]
Event-Bench is an event-oriented long video understanding benchmark built on existing datasets and human annotations.
VIM is a cost-effective method that enhances video MLLMs using merged, event-intensive video instructions.
arXiv Detail & Related papers (2024-06-20T09:14:19Z) - General Object Foundation Model for Images and Videos at Scale [99.2806103051613]
We present GLEE, an object-level foundation model for locating and identifying objects in images and videos.
GLEE accomplishes detection, segmentation, tracking, grounding, and identification of arbitrary objects in the open world scenario.
We employ an image encoder, text encoder, and visual prompter to handle multi-modal inputs, enabling to simultaneously solve various object-centric downstream tasks.
arXiv Detail & Related papers (2023-12-14T17:26:00Z) - Multi-Task Learning of Object State Changes from Uncurated Videos [55.60442251060871]
We learn to temporally localize object state changes by observing people interacting with objects in long uncurated web videos.
We show that our multi-task model achieves a relative improvement of 40% over the prior single-task methods.
We also test our method on long egocentric videos of the EPIC-KITCHENS and the Ego4D datasets in a zero-shot setup.
arXiv Detail & Related papers (2022-11-24T09:42:46Z) - Learning State-Aware Visual Representations from Audible Interactions [39.08554113807464]
We propose a self-supervised algorithm to learn representations from egocentric video data.
We use audio signals to identify moments of likely interactions which are conducive to better learning.
We validate these contributions extensively on two large-scale egocentric datasets.
arXiv Detail & Related papers (2022-09-27T17:57:13Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Meta-Reinforcement Learning via Buffering Graph Signatures for Live
Video Streaming Events [4.332367445046418]
We present a meta-learning model to adapt the predictions of the network's capacity between viewers who participate in a live video streaming event.
We evaluate the proposed model on the link weight prediction task on three real-world of live video streaming events.
arXiv Detail & Related papers (2021-10-03T14:03:22Z) - A Deep Graph Reinforcement Learning Model for Improving User Experience
in Live Video Streaming [7.852895577861326]
We present a deep graph reinforcement learning model to predict and improve the user experience during a live video streaming event.
Our model can significantly increase the number of viewers with high quality experience by at least 75% over the first streaming minutes.
arXiv Detail & Related papers (2021-07-28T19:53:05Z) - Learning Modality Interaction for Temporal Sentence Localization and
Event Captioning in Videos [76.21297023629589]
We propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos.
Our method turns out to achieve state-of-the-art performances on four standard benchmark datasets.
arXiv Detail & Related papers (2020-07-28T12:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.