SalSum: Saliency-based Video Summarization using Generative Adversarial
Networks
- URL: http://arxiv.org/abs/2011.10432v1
- Date: Fri, 20 Nov 2020 14:53:08 GMT
- Title: SalSum: Saliency-based Video Summarization using Generative Adversarial
Networks
- Authors: George Pantazis, George Dimas and Dimitris K. Iakovidis
- Abstract summary: We propose a novel VS approach based on a Generative Adversarial Network (GAN) model-trained with human eye fixations.
The proposed method is evaluated in comparison to state-of-the-art VS approaches on benchmark dataset VSUMM.
- Score: 6.45481313278967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The huge amount of video data produced daily by camera-based systems, such as
surveilance, medical and telecommunication systems, emerges the need for
effective video summarization (VS) methods. These methods should be capable of
creating an overview of the video content. In this paper, we propose a novel VS
method based on a Generative Adversarial Network (GAN) model pre-trained with
human eye fixations. The main contribution of the proposed method is that it
can provide perceptually compatible video summaries by combining both perceived
color and spatiotemporal visual attention cues in a unsupervised scheme.
Several fusion approaches are considered for robustness under uncertainty, and
personalization. The proposed method is evaluated in comparison to
state-of-the-art VS approaches on the benchmark dataset VSUMM. The experimental
results conclude that SalSum outperforms the state-of-the-art approaches by
providing the highest f-measure score on the VSUMM benchmark.
Related papers
- Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Self-Attention Based Generative Adversarial Networks For Unsupervised
Video Summarization [78.2700757742992]
We build on a popular method where a Generative Adversarial Network (GAN) is trained to create representative summaries.
We propose the SUM-GAN-AED model that uses a self-attention mechanism for frame selection, combined with LSTMs for encoding and decoding.
arXiv Detail & Related papers (2023-07-16T19:56:13Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z) - Visual Commonsense-aware Representation Network for Video Captioning [84.67432867555044]
We propose a simple yet effective method, called Visual Commonsense-aware Representation Network (VCRN) for video captioning.
Our method reaches state-of-the-art performance, indicating the effectiveness of our method.
arXiv Detail & Related papers (2022-11-17T11:27:15Z) - Unsupervised Video Summarization via Multi-source Features [4.387757291346397]
Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video.
We propose the incorporation of multiple feature sources with chunk and stride fusion to provide more information about the visual content.
For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-26T13:12:46Z) - GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video
Summarization [18.543372365239673]
The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator.
Results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method.
arXiv Detail & Related papers (2021-04-26T10:50:37Z) - Efficient Video Summarization Framework using EEG and Eye-tracking
Signals [0.92246583941469]
This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims.
To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology.
Using our approach, a video is summarized by 96.5% while maintaining higher precision and high recall factors.
arXiv Detail & Related papers (2021-01-27T08:13:19Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.