Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
- URL: http://arxiv.org/abs/2401.16076v1
- Date: Mon, 29 Jan 2024 11:34:36 GMT
- Title: Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
- Authors: Carlo Bretti, Pascal Mettes, Hendrik Vincent Koops, Daan Odijk, Nanne
van Noord
- Abstract summary: We introduce a multi-modal method for predicting the trailerness to assist editors in selecting trailer- worthy moments from long-form videos.
We present results on a newly introduced soap opera dataset, demonstrating that predicting trailerness is a challenging task.
- Score: 17.476344577463525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating a trailer requires carefully picking out and piecing together brief
enticing moments out of a longer video, making it a chal- lenging and
time-consuming task. This requires selecting moments based on both visual and
dialogue information. We introduce a multi-modal method for predicting the
trailerness to assist editors in selecting trailer- worthy moments from
long-form videos. We present results on a newly introduced soap opera dataset,
demonstrating that predicting trailerness is a challenging task that benefits
from multi-modal information. Code is available at
https://github.com/carlobretti/cliffhanger
Related papers
- Towards Automated Movie Trailer Generation [98.9854474456265]
We introduce Trailer Generation Transformer (TGT), a deep-learning framework utilizing an encoder-decoder architecture.
TGT movie encoder is tasked with contextualizing each movie shot representation via self-attention, while the autoregressive trailer decoder predicts the feature representation of the next trailer shot.
Our TGT significantly outperforms previous methods on a comprehensive suite of metrics.
arXiv Detail & Related papers (2024-04-04T14:28:34Z) - TraveLER: A Multi-LMM Agent Framework for Video Question-Answering [48.55956886819481]
TraveLER is a framework that iteratively collects relevant information from design details through interactive question-asking.
We find that the proposed TraveLER approach improves performance on several video question-answering benchmarks.
arXiv Detail & Related papers (2024-04-01T20:58:24Z) - AI based approach to Trailer Generation for Online Educational Courses [0.0]
The framework we propose is a template based method for video trailer generation.
The proposed trailer is in the form of a timeline consisting of various fragments created by selecting, para-phrasing or generating content.
We perform user evaluation with 63 human evaluators for evaluating the trailers generated by our system.
arXiv Detail & Related papers (2023-01-10T13:33:08Z) - Film Trailer Generation via Task Decomposition [65.16768855902268]
We model movies as graphs, where nodes are shots and edges denote semantic relations between them.
We learn these relations using joint contrastive training which leverages privileged textual information from screenplays.
An unsupervised algorithm then traverses the graph and generates trailers that human judges prefer to ones generated by competitive supervised approaches.
arXiv Detail & Related papers (2021-11-16T20:50:52Z) - Relation-aware Video Reading Comprehension for Temporal Language
Grounding [67.5613853693704]
Temporal language grounding in videos aims to localize the temporal span relevant to the given query sentence.
This paper will formulate temporal language grounding into video reading comprehension and propose a Relation-aware Network (RaNet) to address it.
arXiv Detail & Related papers (2021-10-12T03:10:21Z) - A Case Study of Deep Learning Based Multi-Modal Methods for Predicting
the Age-Suitability Rating of Movie Trailers [15.889598494755646]
We introduce a new dataset containing videos of movie trailers in English downloaded from IMDB and YouTube.
We propose a multi-modal deep learning pipeline addressing the movie trailer age suitability rating problem.
arXiv Detail & Related papers (2021-01-26T17:15:35Z) - Learning Trailer Moments in Full-Length Movies [49.74693903050302]
We leverage the officially-released trailers as the weak supervision to learn a model that can detect the key moments from full-length movies.
We introduce a novel ranking network that utilizes the Co-Attention between movies and trailers as guidance to generate the training pairs.
We construct the first movie-trailer dataset, and the proposed Co-Attention assisted ranking network shows superior performance even over the supervised approach.
arXiv Detail & Related papers (2020-08-19T15:23:25Z) - Multi-modal Transformer for Video Retrieval [67.86763073161012]
We present a multi-modal transformer to jointly encode the different modalities in video.
On the natural language side, we investigate the best practices to jointly optimize the language embedding together with the multi-modal transformer.
This novel framework allows us to establish state-of-the-art results for video retrieval on three datasets.
arXiv Detail & Related papers (2020-07-21T07:38:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.