Related papers: Select and Summarize: Scene Saliency for Movie Script Summarization

Select and Summarize: Scene Saliency for Movie Script Summarization

URL: http://arxiv.org/abs/2404.03561v1
Date: Thu, 4 Apr 2024 16:16:53 GMT
Title: Select and Summarize: Scene Saliency for Movie Script Summarization
Authors: Rohit Saxena, Frank Keller,
Abstract summary: We introduce a scene saliency dataset that consists of human-annotated salient scenes for 100 movies. We propose a two-stage abstractive summarization approach which first identifies the salient scenes in script and then generates a summary using only those scenes.
Score: 11.318175666743656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models. A movie script typically comprises a large number of scenes; however, only a fraction of these scenes are salient, i.e., important for understanding the overall narrative. The salience of a scene can be operationalized by considering it as salient if it is mentioned in the summary. Automatically identifying salient scenes is difficult due to the lack of suitable datasets. In this work, we introduce a scene saliency dataset that consists of human-annotated salient scenes for 100 movies. We propose a two-stage abstractive summarization approach which first identifies the salient scenes in script and then generates a summary using only those scenes. Using QA-based evaluation, we show that our model outperforms previous state-of-the-art summarization methods and reflects the information content of a movie more accurately than a model that takes the whole movie script as input.

Related papers

DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph [6.980991481207376]
We introduce DiscoGraMS, a novel resource that represents movie scripts as a movie character-aware discourse graph (CaD Graph) The model aims to preserve all salient information, offering a more comprehensive and faithful representation of the screenplay's content.
arXiv Detail & Related papers (2024-10-18T17:56:11Z)
ScreenWriter: Automatic Screenplay Generation and Movie Summarisation [55.20132267309382]
Video content has driven demand for textual descriptions or summaries that allow users to recall key plot points or get an overview without watching. We propose the task of automatic screenplay generation, and a method, ScreenWriter, that operates only on video and produces output which includes dialogue, speaker names, scene breaks, and visual descriptions. ScreenWriter introduces a novel algorithm to segment the video into scenes based on the sequence of visual vectors, and a novel method for the challenging problem of determining character names, based on a database of actors' faces.
arXiv Detail & Related papers (2024-10-17T07:59:54Z)
MovieSum: An Abstractive Summarization Dataset for Movie Screenplays [11.318175666743656]
We present a new dataset, MovieSum, for abstractive summarization of movie screenplays. This dataset comprises 2200 movie screenplays accompanied by their Wikipedia plot summaries.
arXiv Detail & Related papers (2024-08-12T16:43:09Z)
Movie101v2: Improved Movie Narration Benchmark [53.54176725112229]
Automatic movie narration aims to generate video-aligned plot descriptions to assist visually impaired audiences. We introduce Movie101v2, a large-scale, bilingual dataset with enhanced data quality specifically designed for movie narration. Based on our new benchmark, we baseline a range of large vision-language models, including GPT-4V, and conduct an in-depth analysis of the challenges in narration generation.
arXiv Detail & Related papers (2024-04-20T13:15:27Z)
Movie101: A New Movie Understanding Benchmark [47.24519006577205]
We construct a large-scale Chinese movie benchmark, named Movie101. We propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation. For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines.
arXiv Detail & Related papers (2023-05-20T08:43:51Z)
Movie Summarization via Sparse Graph Construction [65.16768855902268]
We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information. According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms.
arXiv Detail & Related papers (2020-12-14T13:54:34Z)
Condensed Movies: Story Based Retrieval with Contextual Embeddings [83.73479493450009]
We create the Condensed Movies dataset (CMD) consisting of the key scenes from over 3K movies. The dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use. We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding.
arXiv Detail & Related papers (2020-05-08T17:55:03Z)
Screenplay Summarization Using Latent Narrative Structure [78.45316339164133]
We propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models. We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays. Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode.
arXiv Detail & Related papers (2020-04-27T11:54:19Z)
A Local-to-Global Approach to Multi-modal Movie Scene Segmentation [95.34033481442353]
We build a large-scale video dataset MovieScenes, which contains 21K annotated scene segments from 150 movies. We propose a local-to-global scene segmentation framework, which integrates multi-modal information across three levels, i.e. clip, segment, and movie. Our experiments show that the proposed network is able to segment a movie into scenes with high accuracy, consistently outperforming previous methods.
arXiv Detail & Related papers (2020-04-06T13:58:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.