Adaptive Video Highlight Detection by Learning from User History
- URL: http://arxiv.org/abs/2007.09598v1
- Date: Sun, 19 Jul 2020 05:52:20 GMT
- Title: Adaptive Video Highlight Detection by Learning from User History
- Authors: Mrigank Rochan, Mahesh Kumar Krishna Reddy, Linwei Ye, Yang Wang
- Abstract summary: We propose a framework that learns to adapt highlight detection to a user by exploiting the user's history in the form of highlights that the user has previously created.
Our framework consists of two sub-networks: a fully temporal convolutional highlight detection network $H$ that predicts highlight for an input video and a history encoder network $M$ for user history.
- Score: 18.18119240674014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there is an increasing interest in highlight detection research
where the goal is to create a short duration video from a longer video by
extracting its interesting moments. However, most existing methods ignore the
fact that the definition of video highlight is highly subjective. Different
users may have different preferences of highlight for the same input video. In
this paper, we propose a simple yet effective framework that learns to adapt
highlight detection to a user by exploiting the user's history in the form of
highlights that the user has previously created. Our framework consists of two
sub-networks: a fully temporal convolutional highlight detection network $H$
that predicts highlight for an input video and a history encoder network $M$
for user history. We introduce a newly designed temporal-adaptive instance
normalization (T-AIN) layer to $H$ where the two sub-networks interact with
each other. T-AIN has affine parameters that are predicted from $M$ based on
the user history and is responsible for the user-adaptive signal to $H$.
Extensive experiments on a large-scale dataset show that our framework can make
more accurate and user-specific highlight predictions.
Related papers
- Agent-based Video Trimming [17.519404251018308]
We introduce a novel task called Video Trimming (VT)
VT focuses on detecting wasted footage, selecting valuable segments, and composing them into a final video with a coherent story.
AVT received more favorable evaluations in user studies and demonstrated superior mAP and precision on the YouTube Highlights, TVSum, and our own dataset for the highlight detection task.
arXiv Detail & Related papers (2024-12-12T17:59:28Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention [52.84233165201391]
We propose a method to detect individualized highlights for users on given target videos based on their preferred highlight clips marked on previous videos they have watched.
Our method explicitly leverages the contents of both the preferred clips and the target videos using pre-trained features for the objects and the human activities.
arXiv Detail & Related papers (2022-07-18T02:32:48Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - IntentVizor: Towards Generic Query Guided Interactive Video
Summarization Using Slow-Fast Graph Convolutional Networks [2.5234156040689233]
IntentVizor is an interactive video summarization framework guided by genric multi-modality queries.
We use a set of intents to represent the inputs of users to design our new interactive visual analytic interface.
arXiv Detail & Related papers (2021-09-30T03:44:02Z) - Cross-category Video Highlight Detection via Set-based Learning [55.49267044910344]
We propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework.
It learns the distinction of target category videos and the characteristics of highlight moments on source video category.
It outperforms five typical Unsupervised Domain Adaptation (UDA) algorithms on various cross-category highlight detection tasks.
arXiv Detail & Related papers (2021-08-26T13:06:47Z) - From Implicit to Explicit feedback: A deep neural network for modeling
sequential behaviours and long-short term preferences of online users [3.464871689508835]
Implicit and explicit feedback have different roles for a useful recommendation.
We go from the hypothesis that a user's preference at a time is a combination of long-term and short-term interests.
arXiv Detail & Related papers (2021-07-26T16:59:20Z) - QVHighlights: Detecting Moments and Highlights in Videos via Natural
Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset.
It consists of over 10,000 YouTube videos, covering a wide range of topics.
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - Attentive Item2Vec: Neural Attentive User Representations [29.53270166926552]
We present Attentive Item2vec (AI2V) - a novel attentive version of Item2vec (I2V)
AI2V employs a context-target attention mechanism in order to learn and capture different characteristics of user historical behavior (context) with respect to a potential recommended item (target)
We demonstrate the effectiveness of AI2V on several datasets, where it is shown to outperform other baselines.
arXiv Detail & Related papers (2020-02-15T15:22:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.