Causal Video Summarizer for Video Exploration
- URL: http://arxiv.org/abs/2307.01947v1
- Date: Tue, 4 Jul 2023 22:52:16 GMT
- Title: Causal Video Summarizer for Video Exploration
- Authors: Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel
Worring
- Abstract summary: Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
- Score: 74.27487067877047
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, video summarization has been proposed as a method to help video
exploration. However, traditional video summarization models only generate a
fixed video summary which is usually independent of user-specific needs and
hence limits the effectiveness of video exploration. Multi-modal video
summarization is one of the approaches utilized to address this issue.
Multi-modal video summarization has a video input and a text-based query input.
Hence, effective modeling of the interaction between a video input and
text-based query is essential to multi-modal video summarization. In this work,
a new causality-based method named Causal Video Summarizer (CVS) is proposed to
effectively capture the interactive information between the video and query to
tackle the task of multi-modal video summarization. The proposed method
consists of a probabilistic encoder and a probabilistic decoder. Based on the
evaluation of the existing multi-modal video summarization dataset,
experimental results show that the proposed approach is effective with the
increase of +5.4% in accuracy and +4.92% increase of F 1- score, compared with
the state-of-the-art method.
Related papers
- Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Video Question Answering with Iterative Video-Text Co-Tokenization [77.66445727743508]
We propose a novel multi-stream video encoder for video question answering.
We experimentally evaluate the model on several datasets, such as MSRVTT-QA, MSVD-QA, IVQA.
Our model reduces the required GFLOPs from 150-360 to only 67, producing a highly efficient video question answering model.
arXiv Detail & Related papers (2022-08-01T15:35:38Z) - Modality-Balanced Embedding for Video Retrieval [21.81705847039759]
We identify a modality bias phenomenon that the video encoder almost entirely relies on text matching.
We propose MBVR (short for Modality Balanced Video Retrieval) with two key components.
We show empirically that our method is both effective and efficient in solving modality bias problem.
arXiv Detail & Related papers (2022-04-18T06:29:46Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - VALUE: A Multi-Task Benchmark for Video-and-Language Understanding
Evaluation [124.02278735049235]
VALUE benchmark aims to cover a broad range of video genres, video lengths, data volumes, and task difficulty levels.
We evaluate various baseline methods with and without large-scale VidL pre-training.
The significant gap between our best model and human performance calls for future study for advanced VidL models.
arXiv Detail & Related papers (2021-06-08T18:34:21Z) - GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video
Summarization [18.543372365239673]
The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator.
Results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method.
arXiv Detail & Related papers (2021-04-26T10:50:37Z) - Query-controllable Video Summarization [16.54586273670312]
We introduce a method which takes a text-based query as input and generates a video summary corresponding to it.
Our proposed method consists of a video summary controller, video summary generator, and video summary output module.
arXiv Detail & Related papers (2020-04-07T19:35:04Z) - Convolutional Hierarchical Attention Network for Query-Focused Video
Summarization [74.48782934264094]
This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs.
We propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module.
In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot.
arXiv Detail & Related papers (2020-01-31T04:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.