"Previously on ..." From Recaps to Story Summarization
- URL: http://arxiv.org/abs/2405.11487v1
- Date: Sun, 19 May 2024 09:09:54 GMT
- Title: "Previously on ..." From Recaps to Story Summarization
- Authors: Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi,
- Abstract summary: We introduce multimodal story summarization by leveraging TV episode recaps.
Story summarization labels are unlocked by matching recap shots to corresponding sub-stories in the episode.
We present a thorough evaluation on story summarization, including promising cross-series generalization.
- Score: 13.311411816150551
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce multimodal story summarization by leveraging TV episode recaps - short video sequences interweaving key story moments from previous episodes to bring viewers up to speed. We propose PlotSnap, a dataset featuring two crime thriller TV shows with rich recaps and long episodes of 40 minutes. Story summarization labels are unlocked by matching recap shots to corresponding sub-stories in the episode. We propose a hierarchical model TaleSumm that processes entire episodes by creating compact shot and dialog representations, and predicts importance scores for each video shot and dialog utterance by enabling interactions between local story groups. Unlike traditional summarization, our method extracts multiple plot points from long videos. We present a thorough evaluation on story summarization, including promising cross-series generalization. TaleSumm also shows good results on classic video summarization benchmarks.
Related papers
- A Modular Approach for Multimodal Summarization of TV Shows [55.20132267309382]
We present a modular approach where separate components perform specialized sub-tasks.
Our modules involve detecting scene boundaries, reordering scenes so as to minimize the number of cuts between different events, converting visual information to text, summarizing the dialogue in each scene, and fusing the scene summaries into a final summary for the entire episode.
We also present a new metric, PRISMA, to measure both precision and recall of generated summaries, which we decompose into atomic facts.
arXiv Detail & Related papers (2024-03-06T16:10:01Z) - Shot2Story20K: A New Benchmark for Comprehensive Understanding of
Multi-shot Videos [58.13927287437394]
We present a new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
Preliminary experiments show some challenges to generate a long and comprehensive video summary.
arXiv Detail & Related papers (2023-12-16T03:17:30Z) - VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task.
The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video.
The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z) - Synopses of Movie Narratives: a Video-Language Dataset for Story
Understanding [13.52545041750095]
We release a video-language story dataset, Synopses of Movie Narratives (SyMoN), containing 5,193 video summaries of popular movies and TV series with a total length of 869 hours.
SyMoN captures naturalistic storytelling videos made by human creators and intended for a human audience.
arXiv Detail & Related papers (2022-03-11T01:45:33Z) - TVRecap: A Dataset for Generating Stories with Character Descriptions [43.198875830024825]
TVRecap is a story generation dataset that generates detailed TV show episode recaps from a brief summary and documents describing the characters involved.
We create TVRecap from fan-contributed websites, which allows us to collect 26k episode recaps with 1868.7 tokens on average.
arXiv Detail & Related papers (2021-09-18T05:02:29Z) - SummScreen: A Dataset for Abstractive Screenplay Summarization [52.56760815805357]
SummScreen is a dataset comprised of pairs of TV series transcripts and human written recaps.
Plot details are often expressed indirectly in character dialogues and may be scattered across the entirety of the transcript.
Since characters are fundamental to TV series, we also propose two entity-centric evaluation metrics.
arXiv Detail & Related papers (2021-04-14T19:37:40Z) - Movie Summarization via Sparse Graph Construction [65.16768855902268]
We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information.
According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms.
arXiv Detail & Related papers (2020-12-14T13:54:34Z) - Screenplay Summarization Using Latent Narrative Structure [78.45316339164133]
We propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models.
We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays.
Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode.
arXiv Detail & Related papers (2020-04-27T11:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.