Unsupervised Video Summarization via Multi-source Features
- URL: http://arxiv.org/abs/2105.12532v1
- Date: Wed, 26 May 2021 13:12:46 GMT
- Title: Unsupervised Video Summarization via Multi-source Features
- Authors: Hussain Kanafani, Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth
- Abstract summary: Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video.
We propose the incorporation of multiple feature sources with chunk and stride fusion to provide more information about the visual content.
For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches.
- Score: 4.387757291346397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video summarization aims at generating a compact yet representative visual
summary that conveys the essence of the original video. The advantage of
unsupervised approaches is that they do not require human annotations to learn
the summarization capability and generalize to a wider range of domains.
Previous work relies on the same type of deep features, typically based on a
model pre-trained on ImageNet data. Therefore, we propose the incorporation of
multiple feature sources with chunk and stride fusion to provide more
information about the visual content. For a comprehensive evaluation on the two
benchmarks TVSum and SumMe, we compare our method with four state-of-the-art
approaches. Two of these approaches were implemented by ourselves to reproduce
the reported results. Our evaluation shows that we obtain state-of-the-art
results on both datasets, while also highlighting the shortcomings of previous
work with regard to the evaluation methodology. Finally, we perform error
analysis on videos for the two benchmark datasets to summarize and spot the
factors that lead to misclassifications.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - Aspect-Controllable Opinion Summarization [58.5308638148329]
We propose an approach that allows the generation of customized summaries based on aspect queries.
Using a review corpus, we create a synthetic training dataset of (review, summary) pairs enriched with aspect controllers.
We fine-tune a pretrained model using our synthetic dataset and generate aspect-specific summaries by modifying the aspect controllers.
arXiv Detail & Related papers (2021-09-07T16:09:17Z) - Unsupervised Video Summarization with a Convolutional Attentive
Adversarial Network [32.90753137435032]
We propose a convolutional attentive adversarial network (CAAN) to build a deep summarizer in an unsupervised way.
Specifically, the generator employs a fully convolutional sequence network to extract global representation of a video, and an attention-based network to output normalized importance scores.
The results show the superiority of our proposed method against other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-24T07:24:39Z) - How Good is a Video Summary? A New Benchmarking Dataset and Evaluation
Framework Towards Realistic Video Summarization [11.320914099324492]
We introduce a new benchmarking video dataset called VISIOCITY which comprises of longer videos across six different categories.
We show strategies to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY.
We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment.
arXiv Detail & Related papers (2021-01-26T01:42:55Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - Realistic Video Summarization through VISIOCITY: A New Benchmark and
Evaluation Framework [15.656965429236235]
We take steps towards making automatic video summarization more realistic by addressing several challenges.
Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type.
We introduce a new benchmarking dataset VISIOCITY which comprises of longer videos across six different categories.
arXiv Detail & Related papers (2020-07-29T02:44:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.