Multilevel profiling of situation and dialogue-based deep networks for
movie genre classification using movie trailers
- URL: http://arxiv.org/abs/2109.06488v1
- Date: Tue, 14 Sep 2021 07:33:56 GMT
- Title: Multilevel profiling of situation and dialogue-based deep networks for
movie genre classification using movie trailers
- Authors: Dinesh Kumar Vishwakarma, Mayank Jindal, Ayush Mittal, Aditya Sharma
- Abstract summary: We propose a novel multi-modality: situation, dialogue, and metadata-based movie genre classification framework.
We develop the English movie trailer dataset (EMTD), which contains 2000 Hollywood movie trailers belonging to five popular genres.
- Score: 7.904790547594697
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automated movie genre classification has emerged as an active and essential
area of research and exploration. Short duration movie trailers provide useful
insights about the movie as video content consists of the cognitive and the
affective level features. Previous approaches were focused upon either
cognitive or affective content analysis. In this paper, we propose a novel
multi-modality: situation, dialogue, and metadata-based movie genre
classification framework that takes both cognition and affect-based features
into consideration. A pre-features fusion-based framework that takes into
account: situation-based features from a regular snapshot of a trailer that
includes nouns and verbs providing the useful affect-based mapping with the
corresponding genres, dialogue (speech) based feature from audio, metadata
which together provides the relevant information for cognitive and affect based
video analysis. We also develop the English movie trailer dataset (EMTD), which
contains 2000 Hollywood movie trailers belonging to five popular genres:
Action, Romance, Comedy, Horror, and Science Fiction, and perform
cross-validation on the standard LMTD-9 dataset for validating the proposed
framework. The results demonstrate that the proposed methodology for movie
genre classification has performed excellently as depicted by the F1 scores,
precision, recall, and area under the precision-recall curves.
Related papers
- Beyond Labels: Leveraging Deep Learning and LLMs for Content Metadata [1.6574413179773761]
Analyzing the metadata can help understand the user preferences to generate personalized recommendations and item cold starting.
We present some of the challenges associated with using genre label information and propose a new way of examining the genre information.
The Genre Spectrum helps capture the various nuanced genres in a title and our offline and online experiments corroborate the effectiveness of the approach.
arXiv Detail & Related papers (2023-09-15T22:11:29Z) - Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type Recognition [54.938128496934695]
We propose a new framework that reasons over object-behaviours extracted from raw-video-clips to recognize the clip's corresponding adverb-types.
Specifically, we propose a novel pipeline that extracts human-interpretable object-behaviour-facts from raw video clips.
We release two new datasets of object-behaviour-facts extracted from raw video clips.
arXiv Detail & Related papers (2023-07-09T09:04:26Z) - Movie101: A New Movie Understanding Benchmark [47.24519006577205]
We construct a large-scale Chinese movie benchmark, named Movie101.
We propose a new metric called Movie Narration Score (MNScore) for movie narrating evaluation.
For both two tasks, our proposed methods well leverage external knowledge and outperform carefully designed baselines.
arXiv Detail & Related papers (2023-05-20T08:43:51Z) - Movie Genre Classification by Language Augmentation and Shot Sampling [20.119729119879466]
We propose a Movie genre Classification method based on Language augmentatIon and shot samPling (Movie-CLIP)
Movie-CLIP mainly consists of two parts: a language augmentation module to recognize language elements from the input audio, and a shot sampling module to select representative shots from the entire video.
We evaluate our method on MovieNet and Condensed Movies datasets, achieving approximate 6-9% improvement in mean Average Precision (mAP) over the baselines.
arXiv Detail & Related papers (2022-03-24T18:15:12Z) - Film Trailer Generation via Task Decomposition [65.16768855902268]
We model movies as graphs, where nodes are shots and edges denote semantic relations between them.
We learn these relations using joint contrastive training which leverages privileged textual information from screenplays.
An unsupervised algorithm then traverses the graph and generates trailers that human judges prefer to ones generated by competitive supervised approaches.
arXiv Detail & Related papers (2021-11-16T20:50:52Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - $C^3$: Compositional Counterfactual Contrastive Learning for
Video-grounded Dialogues [97.25466640240619]
Video-grounded dialogue systems aim to integrate video understanding and dialogue understanding to generate responses relevant to both the dialogue and video context.
Most existing approaches employ deep learning models and have achieved remarkable performance, given the relatively small datasets available.
We propose a novel approach of Compositional Counterfactual Contrastive Learning to develop contrastive training between factual and counterfactual samples in video-grounded dialogues.
arXiv Detail & Related papers (2021-06-16T16:05:27Z) - Rethinking movie genre classification with fine-grained semantic
clustering [5.54966601302758]
We find large semantic variations between movies within a single genre definition.
We expand these 'coarse' genre labels by identifying 'fine-grained' semantic information.
Our approach is demonstrated on a newly introduced multi-modal 37,866,450 frame, 8,800 movie trailer dataset.
arXiv Detail & Related papers (2020-12-04T14:58:31Z) - A multimodal approach for multi-label movie genre classification [2.1342631813973507]
We created a dataset composed of trailer video clips, subtitles, synopses, and movie posters from 152,622 movie titles from The Movie Database.
The dataset was carefully curated and organized, and it was also made available as a contribution of this work.
arXiv Detail & Related papers (2020-06-01T00:51:39Z) - Condensed Movies: Story Based Retrieval with Contextual Embeddings [83.73479493450009]
We create the Condensed Movies dataset (CMD) consisting of the key scenes from over 3K movies.
The dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use.
We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding.
arXiv Detail & Related papers (2020-05-08T17:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.