Convolutional Hierarchical Attention Network for Query-Focused Video
Summarization
- URL: http://arxiv.org/abs/2002.03740v3
- Date: Sat, 15 Feb 2020 03:26:30 GMT
- Title: Convolutional Hierarchical Attention Network for Query-Focused Video
Summarization
- Authors: Shuwen Xiao, Zhou Zhao, Zijian Zhang, Xiaohui Yan, Min Yang
- Abstract summary: This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs.
We propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module.
In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot.
- Score: 74.48782934264094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous approaches for video summarization mainly concentrate on finding the
most diverse and representative visual contents as video summary without
considering the user's preference. This paper addresses the task of
query-focused video summarization, which takes user's query and a long video as
inputs and aims to generate a query-focused video summary. In this paper, we
consider the task as a problem of computing similarity between video shots and
query. To this end, we propose a method, named Convolutional Hierarchical
Attention Network (CHAN), which consists of two parts: feature encoding network
and query-relevance computing module. In the encoding network, we employ a
convolutional network with local self-attention mechanism and query-aware
global attention mechanism to learns visual information of each shot. The
encoded features will be sent to query-relevance computing module to generate
queryfocused video summary. Extensive experiments on the benchmark dataset
demonstrate the competitive performance and show the effectiveness of our
approach.
Related papers
- Your Interest, Your Summaries: Query-Focused Long Video Summarization [0.6041235048439966]
This paper introduces an approach for query-focused video summarization, aiming to align video summaries closely with user queries.
We propose the Fully Convolutional Sequence Network with Attention (FCSNA-QFVS), a novel approach designed for this task.
arXiv Detail & Related papers (2024-10-17T23:37:58Z) - GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval [56.610806615527885]
This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address the inherent information imbalance between text and video.
By adaptively segmenting videos into short clips and employing zero-shot captioning, GQE enriches the training dataset with comprehensive scene descriptions.
GQE achieves state-of-the-art performance on several benchmarks, including MSR-VTT, MSVD, LSMDC, and VATEX.
arXiv Detail & Related papers (2024-08-14T01:24:09Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for
Video Summarization [61.69587867308656]
We propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation.
Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video.
arXiv Detail & Related papers (2022-04-18T14:53:33Z) - Unsupervised Video Summarization with a Convolutional Attentive
Adversarial Network [32.90753137435032]
We propose a convolutional attentive adversarial network (CAAN) to build a deep summarizer in an unsupervised way.
Specifically, the generator employs a fully convolutional sequence network to extract global representation of a video, and an attention-based network to output normalized importance scores.
The results show the superiority of our proposed method against other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-24T07:24:39Z) - DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video
Summarization [127.16984421969529]
We introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS.
DeepQAMVS is trained with reinforcement learning, incorporating rewards that capture representativeness, diversity, query-adaptability and temporal coherence.
We achieve state-of-the-art results on the MVS1K dataset, with inference time scaling linearly with the number of input video frames.
arXiv Detail & Related papers (2021-05-13T17:33:26Z) - Fine-grained Iterative Attention Network for TemporalLanguage
Localization in Videos [63.94898634140878]
Temporal language localization in videos aims to ground one video segment in an untrimmed video based on a given sentence query.
We propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.
We evaluate the proposed method on three challenging public benchmarks: Ac-tivityNet Captions, TACoS, and Charades-STA.
arXiv Detail & Related papers (2020-08-06T04:09:03Z) - Query-controllable Video Summarization [16.54586273670312]
We introduce a method which takes a text-based query as input and generates a video summary corresponding to it.
Our proposed method consists of a video summary controller, video summary generator, and video summary output module.
arXiv Detail & Related papers (2020-04-07T19:35:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.