Video Summarization Overview
- URL: http://arxiv.org/abs/2210.11707v1
- Date: Fri, 21 Oct 2022 03:29:31 GMT
- Title: Video Summarization Overview
- Authors: Mayu Otani and Yale Song and Yang Wang
- Abstract summary: Video summarization facilitates quickly grasping video content by creating a compact summary of videos.
This survey covers early studies as well as recent approaches which take advantage of deep learning techniques.
- Score: 25.465707307283434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the broad growth of video capturing devices and applications on the web,
it is more demanding to provide desired video content for users efficiently.
Video summarization facilitates quickly grasping video content by creating a
compact summary of videos. Much effort has been devoted to automatic video
summarization, and various problem settings and approaches have been proposed.
Our goal is to provide an overview of this field. This survey covers early
studies as well as recent approaches which take advantage of deep learning
techniques. We describe video summarization approaches and their underlying
concepts. We also discuss benchmarks and evaluations. We overview how prior
work addressed evaluation and detail the pros and cons of the evaluation
protocols. Last but not least, we discuss open challenges in this field.
Related papers
- Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - Learning to Summarize Videos by Contrasting Clips [1.3999481573773074]
Video summarization aims at choosing parts of a video that narrate a story as close as possible to the original one.
Most of the existing video summarization approaches focus on hand-crafted labels.
We propose contrastive learning as the answer to both questions.
arXiv Detail & Related papers (2023-01-12T18:55:30Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z) - A Comprehensive Review on Recent Methods and Challenges of Video
Description [11.69687792533269]
Video description involves the generation of the natural language description of actions, events, and objects in the video.
There are various applications of video description by filling the gap between languages and vision for visually impaired people.
In the past decade, several works had been done in this field in terms of approaches/methods for video description, evaluation metrics, and datasets.
arXiv Detail & Related papers (2020-11-30T13:08:45Z) - Query-controllable Video Summarization [16.54586273670312]
We introduce a method which takes a text-based query as input and generates a video summary corresponding to it.
Our proposed method consists of a video summary controller, video summary generator, and video summary output module.
arXiv Detail & Related papers (2020-04-07T19:35:04Z) - Convolutional Hierarchical Attention Network for Query-Focused Video
Summarization [74.48782934264094]
This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs.
We propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module.
In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot.
arXiv Detail & Related papers (2020-01-31T04:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.