A multimodal approach for multi-label movie genre classification
- URL: http://arxiv.org/abs/2006.00654v1
- Date: Mon, 1 Jun 2020 00:51:39 GMT
- Title: A multimodal approach for multi-label movie genre classification
- Authors: Rafael B. Mangolin, Rodolfo M. Pereira, Alceu S. Britto Jr., Carlos N.
Silla Jr., Val\'eria D. Feltrim, Diego Bertolini and Yandre M. G. Costa
- Abstract summary: We created a dataset composed of trailer video clips, subtitles, synopses, and movie posters from 152,622 movie titles from The Movie Database.
The dataset was carefully curated and organized, and it was also made available as a contribution of this work.
- Score: 2.1342631813973507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Movie genre classification is a challenging task that has increasingly
attracted the attention of researchers. In this paper, we addressed the
multi-label classification of the movie genres in a multimodal way. For this
purpose, we created a dataset composed of trailer video clips, subtitles,
synopses, and movie posters taken from 152,622 movie titles from The Movie
Database. The dataset was carefully curated and organized, and it was also made
available as a contribution of this work. Each movie of the dataset was labeled
according to a set of eighteen genre labels. We extracted features from these
data using different kinds of descriptors, namely Mel Frequency Cepstral
Coefficients, Statistical Spectrum Descriptor , Local Binary Pattern with
spectrograms, Long-Short Term Memory, and Convolutional Neural Networks. The
descriptors were evaluated using different classifiers, such as BinaryRelevance
and ML-kNN. We have also investigated the performance of the combination of
different classifiers/features using a late fusion strategy, which obtained
encouraging results. Based on the F-Score metric, our best result, 0.628, was
obtained by the fusion of a classifier created using LSTM on the synopses, and
a classifier created using CNN on movie trailer frames. When considering the
AUC-PR metric, the best result, 0.673, was also achieved by combining those
representations, but in addition, a classifier based on LSTM created from the
subtitles was used. These results corroborate the existence of complementarity
among classifiers based on different sources of information in this field of
application. As far as we know, this is the most comprehensive study developed
in terms of the diversity of multimedia sources of information to perform movie
genre classification.
Related papers
- Movie Trailer Genre Classification Using Multimodal Pretrained Features [1.1743167854433303]
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models.
Our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling.
Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP)
arXiv Detail & Related papers (2024-10-11T15:38:05Z) - Music Genre Classification using Large Language Models [50.750620612351284]
This paper exploits the zero-shot capabilities of pre-trained large language models (LLMs) for music genre classification.
The proposed approach splits audio signals into 20 ms chunks and processes them through convolutional feature encoders.
During inference, predictions on individual chunks are aggregated for a final genre classification.
arXiv Detail & Related papers (2024-10-10T19:17:56Z) - Music Genre Classification: A Comparative Analysis of CNN and XGBoost
Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms [0.0]
This study investigates the performances of three models: a proposed convolutional neural network (CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient Boosting (XGBoost) approach on different features.
The results show that the MFCC XGBoost model outperformed the others. Furthermore, applying data segmentation in the data preprocessing phase can significantly enhance the performance of the CNNs.
arXiv Detail & Related papers (2024-01-09T01:50:31Z) - Text-to-feature diffusion for audio-visual few-shot learning [59.45164042078649]
Few-shot learning from video data is a challenging and underexplored, yet much cheaper, setup.
We introduce a unified audio-visual few-shot video classification benchmark on three datasets.
We show that AV-DIFF obtains state-of-the-art performance on our proposed benchmark for audio-visual few-shot learning.
arXiv Detail & Related papers (2023-09-07T17:30:36Z) - Temporal Saliency Query Network for Efficient Video Recognition [82.52760040577864]
Video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices.
Most existing methods select the salient frames without awareness of the class-specific saliency scores.
We propose a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement.
arXiv Detail & Related papers (2022-07-21T09:23:34Z) - Deep ensembles in bioimage segmentation [74.01883650587321]
In this work, we propose an ensemble of convolutional neural networks (CNNs)
In ensemble methods, many different models are trained and then used for classification, the ensemble aggregates the outputs of the single classifiers.
The proposed ensemble is implemented by combining different backbone networks using the DeepLabV3+ and HarDNet environment.
arXiv Detail & Related papers (2021-12-24T05:54:21Z) - Multilevel profiling of situation and dialogue-based deep networks for
movie genre classification using movie trailers [7.904790547594697]
We propose a novel multi-modality: situation, dialogue, and metadata-based movie genre classification framework.
We develop the English movie trailer dataset (EMTD), which contains 2000 Hollywood movie trailers belonging to five popular genres.
arXiv Detail & Related papers (2021-09-14T07:33:56Z) - Interpretation of multi-label classification models using shapley values [0.5482532589225552]
This work further extends the explanation of multi-label classification task by using the SHAP methodology.
The experiment demonstrates a comprehensive comparision of different algorithms on well known multi-label datasets.
arXiv Detail & Related papers (2021-04-21T12:51:12Z) - Deep ensembles based on Stochastic Activation Selection for Polyp
Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations.
Basic architecture in image segmentation consists of an encoder and a decoder.
We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z) - Rethinking movie genre classification with fine-grained semantic
clustering [5.54966601302758]
We find large semantic variations between movies within a single genre definition.
We expand these 'coarse' genre labels by identifying 'fine-grained' semantic information.
Our approach is demonstrated on a newly introduced multi-modal 37,866,450 frame, 8,800 movie trailer dataset.
arXiv Detail & Related papers (2020-12-04T14:58:31Z) - Dense-Caption Matching and Frame-Selection Gating for Temporal
Localization in VideoQA [96.10612095576333]
We propose a video question answering model which effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions.
Our model is also comprised of dual-level attention (word/object and frame level), multi-head self-cross-integration for different sources (video and dense captions), and which pass more relevant information to gates.
We evaluate our model on the challenging TVQA dataset, where each of our model components provides significant gains, and our overall model outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2020-05-13T16:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.