Exploring global diverse attention via pairwise temporal relation for
video summarization
- URL: http://arxiv.org/abs/2009.10942v1
- Date: Wed, 23 Sep 2020 06:29:09 GMT
- Title: Exploring global diverse attention via pairwise temporal relation for
video summarization
- Authors: Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, Ling Shao
- Abstract summary: We propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention.
The proposed models can be run in parallel with significantly less computational costs.
- Score: 84.28263235895798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video summarization is an effective way to facilitate video searching and
browsing. Most of existing systems employ encoder-decoder based recurrent
neural networks, which fail to explicitly diversify the system-generated
summary frames while requiring intensive computations. In this paper, we
propose an efficient convolutional neural network architecture for video
SUMmarization via Global Diverse Attention called SUM-GDA, which adapts
attention mechanism in a global perspective to consider pairwise temporal
relations of video frames. Particularly, the GDA module has two advantages: 1)
it models the relations within paired frames as well as the relations among all
pairs, thus capturing the global attention across all frames of one video; 2)
it reflects the importance of each frame to the whole video, leading to diverse
attention on these frames. Thus, SUM-GDA is beneficial for generating diverse
frames to form satisfactory video summary. Extensive experiments on three data
sets, i.e., SumMe, TVSum, and VTW, have demonstrated that SUM-GDA and its
extension outperform other competing state-of-the-art methods with remarkable
improvements. In addition, the proposed models can be run in parallel with
significantly less computational costs, which helps the deployment in highly
demanding applications.
Related papers
- Bridging the Gap: A Unified Video Comprehension Framework for Moment
Retrieval and Highlight Detection [45.82453232979516]
Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis.
Recent approaches treat MR and HD as similar video grounding problems and address them together with transformer-based architecture.
We propose a Unified Video COMprehension framework (UVCOM) to bridge the gap and jointly solve MR and HD effectively.
arXiv Detail & Related papers (2023-11-28T03:55:23Z) - MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for
Video Summarization [61.69587867308656]
We propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation.
Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video.
arXiv Detail & Related papers (2022-04-18T14:53:33Z) - Exploring Global Diversity and Local Context for Video Summarization [4.452227592307381]
Video summarization aims to automatically generate a diverse and concise summary which is useful in large-scale video processing.
Most of methods tend to adopt self attention mechanism across video frames, which fails to model the diversity of video frames.
We propose global diverse attention by using the squared Euclidean distance instead to compute the affinities.
arXiv Detail & Related papers (2022-01-27T06:56:01Z) - Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame"
A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame.
IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z) - DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video
Summarization [127.16984421969529]
We introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS.
DeepQAMVS is trained with reinforcement learning, incorporating rewards that capture representativeness, diversity, query-adaptability and temporal coherence.
We achieve state-of-the-art results on the MVS1K dataset, with inference time scaling linearly with the number of input video frames.
arXiv Detail & Related papers (2021-05-13T17:33:26Z) - An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z) - Transforming Multi-Concept Attention into Video Summarization [36.85535624026879]
We propose a novel attention-based framework for video summarization with complex video data.
Our model can be applied to both labeled and unlabeled data, making our method preferable to real-world applications.
arXiv Detail & Related papers (2020-06-02T06:23:50Z) - Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.