Exploring Global Diversity and Local Context for Video Summarization
- URL: http://arxiv.org/abs/2201.11345v1
- Date: Thu, 27 Jan 2022 06:56:01 GMT
- Title: Exploring Global Diversity and Local Context for Video Summarization
- Authors: Yingchao Pan, Ouhan Huang, Qinghao Ye, Zhongjin Li, Wenjiang Wang,
Guodun Li, Yuxing Chen
- Abstract summary: Video summarization aims to automatically generate a diverse and concise summary which is useful in large-scale video processing.
Most of methods tend to adopt self attention mechanism across video frames, which fails to model the diversity of video frames.
We propose global diverse attention by using the squared Euclidean distance instead to compute the affinities.
- Score: 4.452227592307381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video summarization aims to automatically generate a diverse and concise
summary which is useful in large-scale video processing. Most of methods tend
to adopt self attention mechanism across video frames, which fails to model the
diversity of video frames. To alleviate this problem, we revisit the pairwise
similarity measurement in self attention mechanism and find that the existing
inner-product affinity leads to discriminative features rather than diversified
features. In light of this phenomenon, we propose global diverse attention by
using the squared Euclidean distance instead to compute the affinities.
Moreover, we model the local contextual information by proposing local
contextual attention to remove the redundancy in the video. By combining these
two attention mechanism, a video \textbf{SUM}marization model with Diversified
Contextual Attention scheme is developed and named as SUM-DCA. Extensive
experiments are conducted on benchmark data sets to verify the effectiveness
and the superiority of SUM-DCA in terms of F-score and rank-based evaluation
without any bells and whistles.
Related papers
- Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Self-Attention Based Generative Adversarial Networks For Unsupervised
Video Summarization [78.2700757742992]
We build on a popular method where a Generative Adversarial Network (GAN) is trained to create representative summaries.
We propose the SUM-GAN-AED model that uses a self-attention mechanism for frame selection, combined with LSTMs for encoding and decoding.
arXiv Detail & Related papers (2023-07-16T19:56:13Z) - Attention in Attention: Modeling Context Correlation for Efficient Video
Classification [47.938500236792244]
This paper proposes an efficient attention-in-attention (AIA) method for focus-wise feature refinement.
We instantiate video feature contexts as dynamics aggregated along a specific axis with global average and pooling operations.
All the computational operations in attention units act on the pooled dimension, which results in quite few computational cost increase.
arXiv Detail & Related papers (2022-04-20T08:37:52Z) - Local-Global Associative Frame Assemble in Video Re-ID [57.7470971197962]
Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause challenges in learning discriminative representations in video re-identification (Re-ID)
Most existing methods tackle this problem by assessing the importance of video frames according to either their local part alignments or global appearance correlations separately.
In this work, we explore jointly both local alignments and global correlations with further consideration of their mutual promotion/reinforcement.
arXiv Detail & Related papers (2021-10-22T19:07:39Z) - Watching You: Global-guided Reciprocal Learning for Video-based Person
Re-identification [82.6971648465279]
We propose a novel Global-guided Reciprocal Learning framework for video-based person Re-ID.
Our approach can achieve better performance than other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-07T12:27:42Z) - Exploring global diverse attention via pairwise temporal relation for
video summarization [84.28263235895798]
We propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention.
The proposed models can be run in parallel with significantly less computational costs.
arXiv Detail & Related papers (2020-09-23T06:29:09Z) - Transforming Multi-Concept Attention into Video Summarization [36.85535624026879]
We propose a novel attention-based framework for video summarization with complex video data.
Our model can be applied to both labeled and unlabeled data, making our method preferable to real-world applications.
arXiv Detail & Related papers (2020-06-02T06:23:50Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.