Realistic Video Summarization through VISIOCITY: A New Benchmark and
Evaluation Framework
- URL: http://arxiv.org/abs/2007.14560v2
- Date: Tue, 25 Aug 2020 09:42:26 GMT
- Title: Realistic Video Summarization through VISIOCITY: A New Benchmark and
Evaluation Framework
- Authors: Vishal Kaushal, Suraj Kothawade, Rishabh Iyer, Ganesh Ramakrishnan
- Abstract summary: We take steps towards making automatic video summarization more realistic by addressing several challenges.
Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type.
We introduce a new benchmarking dataset VISIOCITY which comprises of longer videos across six different categories.
- Score: 15.656965429236235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic video summarization is still an unsolved problem due to several
challenges. We take steps towards making automatic video summarization more
realistic by addressing them. Firstly, the currently available datasets either
have very short videos or have few long videos of only a particular type. We
introduce a new benchmarking dataset VISIOCITY which comprises of longer videos
across six different categories with dense concept annotations capable of
supporting different flavors of video summarization and can be used for other
vision problems. Secondly, for long videos, human reference summaries are
difficult to obtain. We present a novel recipe based on pareto optimality to
automatically generate multiple reference summaries from indirect ground truth
present in VISIOCITY. We show that these summaries are at par with human
summaries. Thirdly, we demonstrate that in the presence of multiple ground
truth summaries (due to the highly subjective nature of the task), learning
from a single combined ground truth summary using a single loss function is not
a good idea. We propose a simple recipe VISIOCITY-SUM to enhance an existing
model using a combination of losses and demonstrate that it beats the current
state of the art techniques when tested on VISIOCITY. We also show that a
single measure to evaluate a summary, as is the current typical practice, falls
short. We propose a framework for better quantitative assessment of summary
quality which is closer to human judgment than a single measure, say F1. We
report the performance of a few representative techniques of video
summarization on VISIOCITY assessed using various measures and bring out the
limitation of the techniques and/or the assessment mechanism in modeling human
judgment and demonstrate the effectiveness of our evaluation framework in doing
so.
Related papers
- Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - A Modular Approach for Multimodal Summarization of TV Shows [55.20132267309382]
We present a modular approach where separate components perform specialized sub-tasks.
Our modules involve detecting scene boundaries, reordering scenes so as to minimize the number of cuts between different events, converting visual information to text, summarizing the dialogue in each scene, and fusing the scene summaries into a final summary for the entire episode.
We also present a new metric, PRISMA, to measure both precision and recall of generated summaries, which we decompose into atomic facts.
arXiv Detail & Related papers (2024-03-06T16:10:01Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Evaluating and Improving Factuality in Multimodal Abstractive
Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary.
We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization.
Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment.
Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z) - Unsupervised Video Summarization via Multi-source Features [4.387757291346397]
Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video.
We propose the incorporation of multiple feature sources with chunk and stride fusion to provide more information about the visual content.
For a comprehensive evaluation on the two benchmarks TVSum and SumMe, we compare our method with four state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-26T13:12:46Z) - How Good is a Video Summary? A New Benchmarking Dataset and Evaluation
Framework Towards Realistic Video Summarization [11.320914099324492]
We introduce a new benchmarking video dataset called VISIOCITY which comprises of longer videos across six different categories.
We show strategies to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY.
We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment.
arXiv Detail & Related papers (2021-01-26T01:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.