Predicting Oscar-Nominated Screenplays with Sentence Embeddings
- URL: http://arxiv.org/abs/2511.05500v1
- Date: Mon, 29 Sep 2025 15:25:02 GMT
- Title: Predicting Oscar-Nominated Screenplays with Sentence Embeddings
- Authors: Francis Gross,
- Abstract summary: This work explores whether it is possible to predict Oscar nominations for screenplays using modern language models.<n>Since no suitable dataset was available, a new one called Movie-O-Label was created by combining the MovieSum collection of movie scripts with curated Oscar records.<n>The best-performing model reached a macro F1 score of 0.66, a precision recall AP of 0.445 and a ROC-AUC of 0.79.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Oscar nominations are an important factor in the movie industry because they can boost both the visibility and the commercial success. This work explores whether it is possible to predict Oscar nominations for screenplays using modern language models. Since no suitable dataset was available, a new one called Movie-O-Label was created by combining the MovieSum collection of movie scripts with curated Oscar records. Each screenplay was represented by its title, Wikipedia summary, and full script. Long scripts were split into overlapping text chunks and encoded with the E5 sentence em bedding model. Then, the screenplay embed dings were classified using a logistic regression model. The best results were achieved when three feature inputs related to screenplays (script, summary, and title) were combined. The best-performing model reached a macro F1 score of 0.66, a precision recall AP of 0.445 with baseline 0.19 and a ROC-AUC of 0.79. The results suggest that even simple models based on modern text embeddings demonstrate good prediction performance and might be a starting point for future research.
Related papers
- ScreenWriter: Automatic Screenplay Generation and Movie Summarisation [55.20132267309382]
Video content has driven demand for textual descriptions or summaries that allow users to recall key plot points or get an overview without watching.
We propose the task of automatic screenplay generation, and a method, ScreenWriter, that operates only on video and produces output which includes dialogue, speaker names, scene breaks, and visual descriptions.
ScreenWriter introduces a novel algorithm to segment the video into scenes based on the sequence of visual vectors, and a novel method for the challenging problem of determining character names, based on a database of actors' faces.
arXiv Detail & Related papers (2024-10-17T07:59:54Z) - MovieSum: An Abstractive Summarization Dataset for Movie Screenplays [11.318175666743656]
We present a new dataset, MovieSum, for abstractive summarization of movie screenplays.
This dataset comprises 2200 movie screenplays accompanied by their Wikipedia plot summaries.
arXiv Detail & Related papers (2024-08-12T16:43:09Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - Movie101: A New Movie Understanding Benchmark [47.24519006577205]
We construct a large-scale Chinese movie benchmark, named Movie101.
We propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation.
For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines.
arXiv Detail & Related papers (2023-05-20T08:43:51Z) - Movie Genre Classification by Language Augmentation and Shot Sampling [20.119729119879466]
We propose a Movie genre Classification method based on Language augmentatIon and shot samPling (Movie-CLIP)
Movie-CLIP mainly consists of two parts: a language augmentation module to recognize language elements from the input audio, and a shot sampling module to select representative shots from the entire video.
We evaluate our method on MovieNet and Condensed Movies datasets, achieving approximate 6-9% improvement in mean Average Precision (mAP) over the baselines.
arXiv Detail & Related papers (2022-03-24T18:15:12Z) - EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained
Embedding Matching [90.98122161162644]
Current metrics for video captioning are mostly based on the text-level comparison between reference and candidate captions.
We propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning.
We exploit a well pre-trained vision-language model to extract visual and linguistic embeddings for computing EMScore.
arXiv Detail & Related papers (2021-11-17T06:02:43Z) - Screenplay Quality Assessment: Can We Predict Who Gets Nominated? [53.9153892362629]
We present a method to evaluate the quality of a screenplay based on linguistic cues.
Based on industry opinions and narratology, we extract and integrate domain-specific features into common classification techniques.
arXiv Detail & Related papers (2020-05-13T02:39:56Z) - Speech2Action: Cross-modal Supervision for Action Recognition [127.10071447772407]
We train a BERT-based Speech2Action classifier on over a thousand movie screenplays.
We then apply this model to the speech segments of a large unlabelled movie corpus.
Using the predictions of this model, we obtain weak action labels for over 800K video clips.
arXiv Detail & Related papers (2020-03-30T16:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.