Demystifying ChatGPT: How It Masters Genre Recognition
- URL: http://arxiv.org/abs/2507.03875v1
- Date: Sat, 05 Jul 2025 03:22:48 GMT
- Title: Demystifying ChatGPT: How It Masters Genre Recognition
- Authors: Subham Raj, Sriparna Saha, Brijraj Singh, Niranjan Pedanekar,
- Abstract summary: This work analyzes three Large Language Models (LLMs) using the MovieLens-100K dataset to assess their genre prediction capabilities.<n>Our findings show that ChatGPT, without fine-tuning, outperformed other LLMs, and fine-tuned ChatGPT performed best overall.
- Score: 15.563620079455026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The introduction of ChatGPT has garnered significant attention within the NLP community and beyond. Previous studies have demonstrated ChatGPT's substantial advancements across various downstream NLP tasks, highlighting its adaptability and potential to revolutionize language-related applications. However, its capabilities and limitations in genre prediction remain unclear. This work analyzes three Large Language Models (LLMs) using the MovieLens-100K dataset to assess their genre prediction capabilities. Our findings show that ChatGPT, without fine-tuning, outperformed other LLMs, and fine-tuned ChatGPT performed best overall. We set up zero-shot and few-shot prompts using audio transcripts/subtitles from movie trailers in the MovieLens-100K dataset, covering 1682 movies of 18 genres, where each movie can have multiple genres. Additionally, we extended our study by extracting IMDb movie posters to utilize a Vision Language Model (VLM) with prompts for poster information. This fine-grained information was used to enhance existing LLM prompts. In conclusion, our study reveals ChatGPT's remarkable genre prediction capabilities, surpassing other language models. The integration of VLM further enhances our findings, showcasing ChatGPT's potential for content-related applications by incorporating visual information from movie posters.
Related papers
- ChatGPT or A Silent Everywhere Helper: A Survey of Large Language Models [0.0]
Chat Generative Pre-trained Transformer (ChatGPT) standing out as a notable exampledue to its advanced capabilities and widespread applications.<n>This survey provides a comprehensive analysis of ChatGPT, exploring its architecture, training processes, and functionalities.
arXiv Detail & Related papers (2025-03-19T22:55:08Z) - Can Large Language Models Capture Video Game Engagement? [1.3873323883842132]
We evaluate comprehensively the capacity of popular Large Language Models to annotate and successfully predict continuous affect annotations of videos.<n>We run over 2,400 experiments to investigate the impact of LLM architecture, model size, input modality, prompting strategy, and ground truth processing method on engagement prediction.
arXiv Detail & Related papers (2025-02-05T17:14:47Z) - ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models [53.9661582975843]
Video Temporal Grounding aims to ground specific segments within an untrimmed video corresponding to a given natural language query.
Existing VTG methods largely depend on supervised learning and extensive annotated data, which is labor-intensive and prone to human biases.
We present ChatVTG, a novel approach that utilizes Video Dialogue Large Language Models (LLMs) for zero-shot video temporal grounding.
arXiv Detail & Related papers (2024-10-01T08:27:56Z) - Movie101v2: Improved Movie Narration Benchmark [53.54176725112229]
Automatic movie narration aims to generate video-aligned plot descriptions to assist visually impaired audiences.
We introduce Movie101v2, a large-scale, bilingual dataset with enhanced data quality specifically designed for movie narration.
Based on our new benchmark, we baseline a range of large vision-language models, including GPT-4V, and conduct an in-depth analysis of the challenges in narration generation.
arXiv Detail & Related papers (2024-04-20T13:15:27Z) - Large Language Models: A Survey [66.39828929831017]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.<n>LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - TouchStone: Evaluating Vision-Language Models by Language Models [91.69776377214814]
We propose an evaluation method that uses strong large language models as judges to comprehensively evaluate the various abilities of LVLMs.
We construct a comprehensive visual dialogue dataset TouchStone, consisting of open-world images and questions, covering five major categories of abilities and 27 subtasks.
We demonstrate that powerful LVLMs, such as GPT-4, can effectively score dialogue quality by leveraging their textual capabilities alone.
arXiv Detail & Related papers (2023-08-31T17:52:04Z) - Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apology [9.941695905504282]
This study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation.<n>We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder.
arXiv Detail & Related papers (2023-05-15T04:10:13Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Movie Genre Classification by Language Augmentation and Shot Sampling [20.119729119879466]
We propose a Movie genre Classification method based on Language augmentatIon and shot samPling (Movie-CLIP)
Movie-CLIP mainly consists of two parts: a language augmentation module to recognize language elements from the input audio, and a shot sampling module to select representative shots from the entire video.
We evaluate our method on MovieNet and Condensed Movies datasets, achieving approximate 6-9% improvement in mean Average Precision (mAP) over the baselines.
arXiv Detail & Related papers (2022-03-24T18:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.