Self-Attention Based Generative Adversarial Networks For Unsupervised
Video Summarization
- URL: http://arxiv.org/abs/2307.08145v1
- Date: Sun, 16 Jul 2023 19:56:13 GMT
- Title: Self-Attention Based Generative Adversarial Networks For Unsupervised
Video Summarization
- Authors: Maria Nektaria Minaidi, Charilaos Papaioannou, Alexandros Potamianos
- Abstract summary: We build on a popular method where a Generative Adversarial Network (GAN) is trained to create representative summaries.
We propose the SUM-GAN-AED model that uses a self-attention mechanism for frame selection, combined with LSTMs for encoding and decoding.
- Score: 78.2700757742992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of producing a comprehensive video
summary following an unsupervised approach that relies on adversarial learning.
We build on a popular method where a Generative Adversarial Network (GAN) is
trained to create representative summaries, indistinguishable from the
originals. The introduction of the attention mechanism into the architecture
for the selection, encoding and decoding of video frames, shows the efficacy of
self-attention and transformer in modeling temporal relationships for video
summarization. We propose the SUM-GAN-AED model that uses a self-attention
mechanism for frame selection, combined with LSTMs for encoding and decoding.
We evaluate the performance of the SUM-GAN-AED model on the SumMe, TVSum and
COGNIMUSE datasets. Experimental results indicate that using a self-attention
mechanism as the frame selection mechanism outperforms the state-of-the-art on
SumMe and leads to comparable to state-of-the-art performance on TVSum and
COGNIMUSE.
Related papers
- Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for
Video Summarization [61.69587867308656]
We propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation.
Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video.
arXiv Detail & Related papers (2022-04-18T14:53:33Z) - Exploring Global Diversity and Local Context for Video Summarization [4.452227592307381]
Video summarization aims to automatically generate a diverse and concise summary which is useful in large-scale video processing.
Most of methods tend to adopt self attention mechanism across video frames, which fails to model the diversity of video frames.
We propose global diverse attention by using the squared Euclidean distance instead to compute the affinities.
arXiv Detail & Related papers (2022-01-27T06:56:01Z) - SalSum: Saliency-based Video Summarization using Generative Adversarial
Networks [6.45481313278967]
We propose a novel VS approach based on a Generative Adversarial Network (GAN) model-trained with human eye fixations.
The proposed method is evaluated in comparison to state-of-the-art VS approaches on benchmark dataset VSUMM.
arXiv Detail & Related papers (2020-11-20T14:53:08Z) - Exploring global diverse attention via pairwise temporal relation for
video summarization [84.28263235895798]
We propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention.
The proposed models can be run in parallel with significantly less computational costs.
arXiv Detail & Related papers (2020-09-23T06:29:09Z) - Transforming Multi-Concept Attention into Video Summarization [36.85535624026879]
We propose a novel attention-based framework for video summarization with complex video data.
Our model can be applied to both labeled and unlabeled data, making our method preferable to real-world applications.
arXiv Detail & Related papers (2020-06-02T06:23:50Z) - Convolutional Hierarchical Attention Network for Query-Focused Video
Summarization [74.48782934264094]
This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs.
We propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module.
In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot.
arXiv Detail & Related papers (2020-01-31T04:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.