Multilevel profiling of situation and dialogue-based deep networks for
movie genre classification using movie trailers
- URL: http://arxiv.org/abs/2109.06488v1
- Date: Tue, 14 Sep 2021 07:33:56 GMT
- Title: Multilevel profiling of situation and dialogue-based deep networks for
movie genre classification using movie trailers
- Authors: Dinesh Kumar Vishwakarma, Mayank Jindal, Ayush Mittal, Aditya Sharma
- Abstract summary: We propose a novel multi-modality: situation, dialogue, and metadata-based movie genre classification framework.
We develop the English movie trailer dataset (EMTD), which contains 2000 Hollywood movie trailers belonging to five popular genres.
- Score: 7.904790547594697
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automated movie genre classification has emerged as an active and essential
area of research and exploration. Short duration movie trailers provide useful
insights about the movie as video content consists of the cognitive and the
affective level features. Previous approaches were focused upon either
cognitive or affective content analysis. In this paper, we propose a novel
multi-modality: situation, dialogue, and metadata-based movie genre
classification framework that takes both cognition and affect-based features
into consideration. A pre-features fusion-based framework that takes into
account: situation-based features from a regular snapshot of a trailer that
includes nouns and verbs providing the useful affect-based mapping with the
corresponding genres, dialogue (speech) based feature from audio, metadata
which together provides the relevant information for cognitive and affect based
video analysis. We also develop the English movie trailer dataset (EMTD), which
contains 2000 Hollywood movie trailers belonging to five popular genres:
Action, Romance, Comedy, Horror, and Science Fiction, and perform
cross-validation on the standard LMTD-9 dataset for validating the proposed
framework. The results demonstrate that the proposed methodology for movie
genre classification has performed excellently as depicted by the F1 scores,
precision, recall, and area under the precision-recall curves.
Related papers
- Movie Trailer Genre Classification Using Multimodal Pretrained Features [1.1743167854433303]
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models.
Our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling.
Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP)
arXiv Detail & Related papers (2024-10-11T15:38:05Z) - MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions [69.9122231800796]
We present MMTrail, a large-scale multi-modality video-language dataset incorporating more than 20M trailer clips with visual captions.
We propose a systemic captioning framework, achieving various modality annotations with more than 27.1k hours of trailer videos.
Our dataset potentially paves the path for fine-grained large multimodal-language model training.
arXiv Detail & Related papers (2024-07-30T16:43:24Z) - Movie101v2: Improved Movie Narration Benchmark [53.54176725112229]
Automatic movie narration aims to generate video-aligned plot descriptions to assist visually impaired audiences.
We introduce Movie101v2, a large-scale, bilingual dataset with enhanced data quality specifically designed for movie narration.
Based on our new benchmark, we baseline a range of large vision-language models, including GPT-4V, and conduct an in-depth analysis of the challenges in narration generation.
arXiv Detail & Related papers (2024-04-20T13:15:27Z) - Beyond Labels: Leveraging Deep Learning and LLMs for Content Metadata [1.6574413179773761]
Analyzing the metadata can help understand the user preferences to generate personalized recommendations and item cold starting.
We present some of the challenges associated with using genre label information and propose a new way of examining the genre information.
The Genre Spectrum helps capture the various nuanced genres in a title and our offline and online experiments corroborate the effectiveness of the approach.
arXiv Detail & Related papers (2023-09-15T22:11:29Z) - Movie101: A New Movie Understanding Benchmark [47.24519006577205]
We construct a large-scale Chinese movie benchmark, named Movie101.
We propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation.
For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines.
arXiv Detail & Related papers (2023-05-20T08:43:51Z) - Movie Genre Classification by Language Augmentation and Shot Sampling [20.119729119879466]
We propose a Movie genre Classification method based on Language augmentatIon and shot samPling (Movie-CLIP)
Movie-CLIP mainly consists of two parts: a language augmentation module to recognize language elements from the input audio, and a shot sampling module to select representative shots from the entire video.
We evaluate our method on MovieNet and Condensed Movies datasets, achieving approximate 6-9% improvement in mean Average Precision (mAP) over the baselines.
arXiv Detail & Related papers (2022-03-24T18:15:12Z) - Film Trailer Generation via Task Decomposition [65.16768855902268]
We model movies as graphs, where nodes are shots and edges denote semantic relations between them.
We learn these relations using joint contrastive training which leverages privileged textual information from screenplays.
An unsupervised algorithm then traverses the graph and generates trailers that human judges prefer to ones generated by competitive supervised approaches.
arXiv Detail & Related papers (2021-11-16T20:50:52Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Rethinking movie genre classification with fine-grained semantic
clustering [5.54966601302758]
We find large semantic variations between movies within a single genre definition.
We expand these 'coarse' genre labels by identifying 'fine-grained' semantic information.
Our approach is demonstrated on a newly introduced multi-modal 37,866,450 frame, 8,800 movie trailer dataset.
arXiv Detail & Related papers (2020-12-04T14:58:31Z) - Condensed Movies: Story Based Retrieval with Contextual Embeddings [83.73479493450009]
We create the Condensed Movies dataset (CMD) consisting of the key scenes from over 3K movies.
The dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use.
We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding.
arXiv Detail & Related papers (2020-05-08T17:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.