Can Language Models Laugh at YouTube Short-form Videos?
- URL: http://arxiv.org/abs/2310.14159v3
- Date: Sun, 31 Mar 2024 10:51:06 GMT
- Title: Can Language Models Laugh at YouTube Short-form Videos?
- Authors: Dayoon Ko, Sangho Lee, Gunhee Kim,
- Abstract summary: We curate a user-generated dataset of 10K multimodal funny videos from YouTube, called ExFunTube.
Using a video filtering pipeline with GPT-3.5, we verify both verbal and visual elements contributing to humor.
After filtering, we annotate each video with timestamps and text explanations for funny moments.
- Score: 40.47384055149102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As short-form funny videos on social networks are gaining popularity, it becomes demanding for AI models to understand them for better communication with humans. Unfortunately, previous video humor datasets target specific domains, such as speeches or sitcoms, and mostly focus on verbal cues. We curate a user-generated dataset of 10K multimodal funny videos from YouTube, called ExFunTube. Using a video filtering pipeline with GPT-3.5, we verify both verbal and visual elements contributing to humor. After filtering, we annotate each video with timestamps and text explanations for funny moments. Our ExFunTube is unique over existing datasets in that our videos cover a wide range of domains with various types of humor that necessitate a multimodal understanding of the content. Also, we develop a zero-shot video-to-text prompting to maximize video humor understanding of large language models (LLMs). With three different evaluation methods using automatic scores, rationale quality experiments, and human evaluations, we show that our prompting significantly improves LLMs' ability for humor explanation.
Related papers
- FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild [12.530540250653633]
We propose FunnyNet-W, a model that relies on cross- and self-attention for visual, audio and text data to predict funny moments in videos.
We provide experiments on five datasets: the sitcoms TBBT, MHD, MUStARD, Friends, and the TED talk UR-Funny.
arXiv Detail & Related papers (2024-01-08T19:39:36Z) - SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models [32.60274453610208]
We tackle a new challenge for machines to understand the rationale behind laughter in video.
Our proposed dataset, SMILE, comprises video clips and language descriptions of why people laugh.
arXiv Detail & Related papers (2023-12-15T14:17:45Z) - FunQA: Towards Surprising Video Comprehension [64.58663825184958]
We introduce FunQA, a challenging video question-answering dataset.
FunQA covers three previously unexplored types of surprising videos: HumorQA, CreativeQA, and MagicQA.
In total, the FunQA benchmark consists of 312K free-text QA pairs derived from 4.3K video clips.
arXiv Detail & Related papers (2023-06-26T17:59:55Z) - A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In
Zero Shot [67.00455874279383]
We propose verbalizing long videos to generate descriptions in natural language, then performing video-understanding tasks on the generated story as opposed to the original video.
Our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding.
To alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.
arXiv Detail & Related papers (2023-05-16T19:13:11Z) - Knowledge Enhanced Model for Live Video Comment Generation [40.762720398152766]
We propose a knowledge enhanced generation model inspired by the divergent and informative nature of live video comments.
Our model adopts a pre-training encoder-decoder framework and incorporates external knowledge.
The MovieLC dataset and our code will be released.
arXiv Detail & Related papers (2023-04-28T07:03:50Z) - Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results [84.37263300062597]
Humor is a substantial element of human social behavior, affect, and cognition.
Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications.
We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor dataset, comprising about 11 hours of recordings.
arXiv Detail & Related papers (2022-09-28T17:36:47Z) - 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social
Media Short Videos [72.69052180249598]
We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj.
3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages.
We show how the social media content in 3MASSIV is dynamic and temporal in nature, which can be used for semantic understanding tasks and cross-lingual analysis.
arXiv Detail & Related papers (2022-03-28T02:47:01Z) - DeHumor: Visual Analytics for Decomposing Humor [36.300283476950796]
We develop DeHumor, a visual system for analyzing humorous behaviors in public speaking.
To intuitively reveal the building blocks of each concrete example, DeHumor decomposes each humorous video into multimodal features.
We show that DeHumor is able to highlight various building blocks of humor examples.
arXiv Detail & Related papers (2021-07-18T04:01:07Z) - Less is More: ClipBERT for Video-and-Language Learning via Sparse
Sampling [98.41300980759577]
A canonical approach to video-and-language learning dictates a neural model to learn from offline-extracted dense video features.
We propose a generic framework ClipBERT that enables affordable end-to-end learning for video-and-language tasks.
Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms existing methods.
arXiv Detail & Related papers (2021-02-11T18:50:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.