Related papers: Unraveling Movie Genres through Cross-Attention Fusion of Bi-Modal Synergy of Poster

Unraveling Movie Genres through Cross-Attention Fusion of Bi-Modal Synergy of Poster

URL: http://arxiv.org/abs/2410.19764v1
Date: Sat, 12 Oct 2024 16:14:18 GMT
Title: Unraveling Movie Genres through Cross-Attention Fusion of Bi-Modal Synergy of Poster
Authors: Utsav Kumar Nareti, Chandranath Adak, Soumi Chattopadhyay, Pichao Wang,
Abstract summary: Movie genre classification plays a pivotal role in film marketing, audience engagement, and recommendation systems. Previous explorations into movie genre classification have been mostly examined in plot summaries, subtitles, trailers and movie scenes. We present the framework that exploits movie posters from a visual and textual perspective to address the multilabel movie genre classification problem.
Score: 13.28948224096886
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Movie posters are not just decorative; they are meticulously designed to capture the essence of a movie, such as its genre, storyline, and tone/vibe. For decades, movie posters have graced cinema walls, billboards, and now our digital screens as a form of digital posters. Movie genre classification plays a pivotal role in film marketing, audience engagement, and recommendation systems. Previous explorations into movie genre classification have been mostly examined in plot summaries, subtitles, trailers and movie scenes. Movie posters provide a pre-release tantalizing glimpse into a film's key aspects, which can ignite public interest. In this paper, we presented the framework that exploits movie posters from a visual and textual perspective to address the multilabel movie genre classification problem. Firstly, we extracted text from movie posters using an OCR and retrieved the relevant embedding. Next, we introduce a cross-attention-based fusion module to allocate attention weights to visual and textual embedding. In validating our framework, we utilized 13882 posters sourced from the Internet Movie Database (IMDb). The outcomes of the experiments indicate that our model exhibited promising performance and outperformed even some prominent contemporary architectures.

Related papers

POSTA: A Go-to Framework for Customized Artistic Poster Generation [87.16343612086959]
POSTA is a modular framework for customized artistic poster generation. Background Diffusion creates a themed background based on user input. Design MLLM then generates layout and typography elements that align with and complement the background style. ArtText Diffusion applies additional stylization to key text elements.
arXiv Detail & Related papers (2025-03-19T05:22:38Z)
MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model [26.361736240401594]
Movie posters are vital for captivating audiences, conveying themes, and driving market competition in the film industry. Despite exciting progress in image generation, current models often fall short in producing satisfactory poster results. We propose a Movie Posters DataSet (MPDS), tailored for text-to-image generation models to revolutionize poster production.
arXiv Detail & Related papers (2024-10-22T09:20:03Z)
Towards Automated Movie Trailer Generation [98.9854474456265]
We introduce Trailer Generation Transformer (TGT), a deep-learning framework utilizing an encoder-decoder architecture. TGT movie encoder is tasked with contextualizing each movie shot representation via self-attention, while the autoregressive trailer decoder predicts the feature representation of the next trailer shot. Our TGT significantly outperforms previous methods on a comprehensive suite of metrics.
arXiv Detail & Related papers (2024-04-04T14:28:34Z)
Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification [0.35998666903987897]
We present a deep transformer network with a probabilistic module to identify the movie genres exclusively from the poster. For experiments, we procured 13882 posters of 13 genres from the Internet Movie Database (IMDb), where our model performances were encouraging.
arXiv Detail & Related papers (2023-09-21T12:39:36Z)
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning [54.73173491543553]
MoviePuzzle is a novel challenge that targets visual narrative reasoning and holistic movie understanding. To tackle this quandary, we put forth MoviePuzzle task that amplifies the temporal feature learning and structure learning of video models. Our approach outperforms existing state-of-the-art methods on the MoviePuzzle benchmark.
arXiv Detail & Related papers (2023-06-04T03:51:54Z)
MovieCLIP: Visual Scene Recognition in Movies [38.90153620199725]
Existing visual scene datasets in movies have limited and don't consider the visual scene transition within movie clips. In this work, we address the problem of visual scene recognition in movies by first automatically curating a new and extensive movie-centric taxonomy. Instead of manual annotations which can be expensive, we use CLIP to weakly label 1.12 million shots from 32K movie clips based on our proposed taxonomy.
arXiv Detail & Related papers (2022-10-20T07:38:56Z)
Film Trailer Generation via Task Decomposition [65.16768855902268]
We model movies as graphs, where nodes are shots and edges denote semantic relations between them. We learn these relations using joint contrastive training which leverages privileged textual information from screenplays. An unsupervised algorithm then traverses the graph and generates trailers that human judges prefer to ones generated by competitive supervised approaches.
arXiv Detail & Related papers (2021-11-16T20:50:52Z)
Multilevel profiling of situation and dialogue-based deep networks for movie genre classification using movie trailers [7.904790547594697]
We propose a novel multi-modality: situation, dialogue, and metadata-based movie genre classification framework. We develop the English movie trailer dataset (EMTD), which contains 2000 Hollywood movie trailers belonging to five popular genres.
arXiv Detail & Related papers (2021-09-14T07:33:56Z)
Using Robust Regression to Find Font Usage Trends [8.5941401672901]
We use movie posters as the source of fonts for this task because movie posters can represent time periods by using their release date. To understand the relationship between the fonts of movie posters and time, we use a regression Convolutional Neural Network (CNN) to estimate the release year of a movie. Due to the difficulty of the task, we propose to use a hybrid training regimen that uses a combination of Mean Squared Error (MSE) and Tukey's biweight loss.
arXiv Detail & Related papers (2021-06-29T10:29:00Z)
Political Posters Identification with Appearance-Text Fusion [49.55696202606098]
We propose a method that efficiently utilizes appearance features and text vectors to accurately classify political posters. The majority of this work focuses on political posters that are designed to serve as a promotion of a certain political event.
arXiv Detail & Related papers (2020-12-19T16:14:51Z)
Movie Summarization via Sparse Graph Construction [65.16768855902268]
We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information. According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms.
arXiv Detail & Related papers (2020-12-14T13:54:34Z)
A Unified Framework for Shot Type Classification Based on Subject Centric Lens [89.26211834443558]
We propose a learning framework for shot type recognition using Subject Guidance Network (SGNet) SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. We build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types.
arXiv Detail & Related papers (2020-08-08T15:49:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.