Video Summarization Based on Video-text Modelling
- URL: http://arxiv.org/abs/2201.02494v2
- Date: Mon, 10 Jan 2022 02:23:06 GMT
- Title: Video Summarization Based on Video-text Modelling
- Authors: Li Haopeng, Ke Qiuhong, Gong Mingming, Zhang Rui
- Abstract summary: We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern video summarization methods are based on deep neural networks which
require a large amount of annotated data for training. However, existing
datasets for video summarization are small-scale, easily leading to
over-fitting of the deep models. Considering that the annotation of large-scale
datasets is time-consuming, we propose a multimodal self-supervised learning
framework to obtain semantic representations of videos, which benefits the
video summarization task. Specifically, we explore the semantic consistency
between the visual information and text information of videos, for the
self-supervised pretraining of a multimodal encoder on a newly-collected
dataset of video-text pairs. Additionally, we introduce a progressive video
summarization method, where the important content in a video is pinpointed
progressively to generate better summaries. Finally, an objective evaluation
framework is proposed to measure the quality of video summaries based on video
classification. Extensive experiments have proved the effectiveness and
superiority of our method in rank correlation coefficients, F-score, and the
proposed objective evaluation compared to the state of the art.
Related papers
- Enhancing Video Summarization with Context Awareness [9.861215740353247]
Video summarization automatically generate concise summaries by selecting techniques, shots, or segments that capture the video's essence.
Despite the importance of video summarization, there is a lack of diverse and representative datasets.
We propose an unsupervised approach that leverages video data structure and information for generating informative summaries.
arXiv Detail & Related papers (2024-04-06T09:08:34Z) - Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - DeVAn: Dense Video Annotation for Video-Language Models [68.70692422636313]
We present a novel human annotated dataset for evaluating the ability for visual-language models to generate descriptions for real-world video clips.
The dataset contains 8.5K YouTube video clips of 20-60 seconds in duration and covers a wide range of topics and interests.
arXiv Detail & Related papers (2023-10-08T08:02:43Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - SELF-VS: Self-supervised Encoding Learning For Video Summarization [6.21295508577576]
We propose a novel self-supervised video representation learning method using knowledge distillation to pre-train a transformer encoder.
Our method matches its semantic video representation, which is constructed with respect to frame importance scores, to a representation derived from a CNN trained on video classification.
arXiv Detail & Related papers (2023-03-28T14:08:05Z) - VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task.
The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video.
The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video
Summarisation [0.0]
We introduce SummaryNet as a supervised learning framework for automated video summarisation.
It employs a two-stream convolutional network to learn spatial (appearance) and temporal (motion) representations.
arXiv Detail & Related papers (2020-02-19T18:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.