Query-controllable Video Summarization
- URL: http://arxiv.org/abs/2004.03661v1
- Date: Tue, 7 Apr 2020 19:35:04 GMT
- Title: Query-controllable Video Summarization
- Authors: Jia-Hong Huang and Marcel Worring
- Abstract summary: We introduce a method which takes a text-based query as input and generates a video summary corresponding to it.
Our proposed method consists of a video summary controller, video summary generator, and video summary output module.
- Score: 16.54586273670312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When video collections become huge, how to explore both within and across
videos efficiently is challenging. Video summarization is one of the ways to
tackle this issue. Traditional summarization approaches limit the effectiveness
of video exploration because they only generate one fixed video summary for a
given input video independent of the information need of the user. In this
work, we introduce a method which takes a text-based query as input and
generates a video summary corresponding to it. We do so by modeling video
summarization as a supervised learning problem and propose an end-to-end deep
learning based method for query-controllable video summarization to generate a
query-dependent video summary. Our proposed method consists of a video summary
controller, video summary generator, and video summary output module. To foster
the research of query-controllable video summarization and conduct our
experiments, we introduce a dataset that contains frame-based relevance score
labels. Based on our experimental result, it shows that the text-based query
helps control the video summary. It also shows the text-based query improves
our model performance. Our code and dataset:
https://github.com/Jhhuangkay/Query-controllable-Video-Summarization.
Related papers
- Personalized Video Summarization using Text-Based Queries and Conditional Modeling [3.4447129363520337]
This thesis explores enhancing video summarization by integrating text-based queries and conditional modeling.
Evaluation metrics such as accuracy and F1-score assess the quality of the generated summaries.
arXiv Detail & Related papers (2024-08-27T02:43:40Z) - GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval [56.610806615527885]
This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address the inherent information imbalance between text and video.
By adaptively segmenting videos into short clips and employing zero-shot captioning, GQE enriches the training dataset with comprehensive scene descriptions.
GQE achieves state-of-the-art performance on several benchmarks, including MSR-VTT, MSVD, LSMDC, and VATEX.
arXiv Detail & Related papers (2024-08-14T01:24:09Z) - Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task.
The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video.
The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z) - Video Summarization Overview [25.465707307283434]
Video summarization facilitates quickly grasping video content by creating a compact summary of videos.
This survey covers early studies as well as recent approaches which take advantage of deep learning techniques.
arXiv Detail & Related papers (2022-10-21T03:29:31Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video
Summarization [18.543372365239673]
The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator.
Results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method.
arXiv Detail & Related papers (2021-04-26T10:50:37Z) - Convolutional Hierarchical Attention Network for Query-Focused Video
Summarization [74.48782934264094]
This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs.
We propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module.
In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot.
arXiv Detail & Related papers (2020-01-31T04:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.