HumorDB: Can AI understand graphical humor?
- URL: http://arxiv.org/abs/2406.13564v2
- Date: Thu, 24 Jul 2025 23:41:15 GMT
- Title: HumorDB: Can AI understand graphical humor?
- Authors: Veedant Jain, Gabriel Kreiman, Felipe dos Santos Alves Feitosa,
- Abstract summary: This paper introduces textbfHumorDB, a dataset designed to evaluate and advance visual humor understanding by AI systems.<n>We evaluate humans, state-of-the-art vision models, and large vision-language models on three tasks: binary humor classification, funniness rating prediction, and pairwise humor comparison.<n>The results reveal a gap between current AI systems and human-level humor understanding.
- Score: 8.75275650545552
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite significant advancements in image segmentation and object detection, understanding complex scenes remains a significant challenge. Here, we focus on graphical humor as a paradigmatic example of image interpretation that requires elucidating the interaction of different scene elements in the context of prior cognitive knowledge. This paper introduces \textbf{HumorDB}, a novel, controlled, and carefully curated dataset designed to evaluate and advance visual humor understanding by AI systems. The dataset comprises diverse images spanning photos, cartoons, sketches, and AI-generated content, including minimally contrastive pairs where subtle edits differentiate between humorous and non-humorous versions. We evaluate humans, state-of-the-art vision models, and large vision-language models on three tasks: binary humor classification, funniness rating prediction, and pairwise humor comparison. The results reveal a gap between current AI systems and human-level humor understanding. While pretrained vision-language models perform better than vision-only models, they still struggle with abstract sketches and subtle humor cues. Analysis of attention maps shows that even when models correctly classify humorous images, they often fail to focus on the precise regions that make the image funny. Preliminary mechanistic interpretability studies and evaluation of model explanations provide initial insights into how different architectures process humor. Our results identify promising trends and current limitations, suggesting that an effective understanding of visual humor requires sophisticated architectures capable of detecting subtle contextual features and bridging the gap between visual perception and abstract reasoning. All the code and data are available here: \href{https://github.com/kreimanlab/HumorDB}{https://github.com/kreimanlab/HumorDB}
Related papers
- From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy [6.124881326867511]
In light of the widespread adoption of Large Language Models, the intersection of humor and AI has become no laughing matter.
In this study, we assess the ability of models in accurately identifying humorous quotes from a stand-up comedy transcript.
We propose a novel humor detection metric designed to evaluate LLMs amongst various prompts on their capability to extract humorous punchlines.
arXiv Detail & Related papers (2025-04-12T02:19:53Z) - Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content [0.0]
The Deceptive Humor dataset (DHD) is a novel resource for studying humor derived from fabricated claims and misinformation.
DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information.
The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En)
arXiv Detail & Related papers (2025-03-20T10:58:02Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Can Pre-trained Language Models Understand Chinese Humor? [74.96509580592004]
This paper is the first work that systematically investigates the humor understanding ability of pre-trained language models (PLMs)
We construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework.
Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation.
arXiv Detail & Related papers (2024-07-04T18:13:38Z) - Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions [16.23585043442914]
This paper focuses on comics with contradictory narratives, where each comic consists of two panels that create a humorous contradiction.
We introduce the YesBut benchmark, which comprises tasks of varying difficulty aimed at assessing AI's capabilities in recognizing and interpreting these comics.
Our results show that even state-of-the-art models still lag behind human performance on this task.
arXiv Detail & Related papers (2024-05-29T13:51:43Z) - Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models [27.936545041302377]
Large language models (LLMs) can generate synthetic data for humor detection via editing texts.
We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes.
We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators.
arXiv Detail & Related papers (2024-02-23T02:58:12Z) - Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment [64.49170817854942]
We present a method to provide detailed explanation of detected misalignments between text-image pairs.
We leverage large language models and visual grounding models to automatically construct a training set that holds plausible captions for a given image.
We also publish a new human curated test set comprising ground-truth textual and visual misalignment annotations.
arXiv Detail & Related papers (2023-12-05T20:07:34Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? [27.899718595182172]
We present OxfordTVG-HIC (Humorous Image Captions), a large-scale dataset for humour generation and understanding.
OxfordTVG-HIC features a wide range of emotional and semantic diversity resulting in out-of-context examples.
We show how OxfordTVG-HIC can be leveraged for evaluating the humour of a generated text.
arXiv Detail & Related papers (2023-07-21T14:58:44Z) - Revisiting the Role of Language Priors in Vision-Language Models [90.0317841097143]
Vision-language models (VLMs) are applied to a variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning.
We study $textitgenerative VLMs$ that are trained for next-word generation given an image.
We explore their zero-shot performance on the illustrative task of image-text retrieval across 8 popular vision-language benchmarks.
arXiv Detail & Related papers (2023-06-02T19:19:43Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - ExPUNations: Augmenting Puns with Keywords and Explanations [88.58174386894913]
We augment an existing dataset of puns with detailed crowdsourced annotations of keywords.
This is the first humor dataset with such extensive and fine-grained annotations specifically for puns.
We propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation.
arXiv Detail & Related papers (2022-10-24T18:12:02Z) - Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results [84.37263300062597]
Humor is a substantial element of human social behavior, affect, and cognition.
Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications.
We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor dataset, comprising about 11 hours of recordings.
arXiv Detail & Related papers (2022-09-28T17:36:47Z) - Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks
from The New Yorker Caption Contest [70.40189243067857]
Large neural networks can now generate jokes, but do they really "understand" humor?
We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest.
We find that both types of models struggle at all three tasks.
arXiv Detail & Related papers (2022-09-13T20:54:00Z) - Biasing Like Human: A Cognitive Bias Framework for Scene Graph
Generation [20.435023745201878]
We propose a novel 3-paradigms framework that simulates how humans incorporate the label linguistic features as guidance of vision-based representations.
Our framework is model-agnostic to any scene graph model.
arXiv Detail & Related papers (2022-03-17T08:29:52Z) - M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in
Conversations [72.81164101048181]
We propose a dataset for Multimodal Multiparty Hindi Humor (M2H2) recognition in conversations containing 6,191 utterances from 13 episodes of a very popular TV series "Shrimaan Shrimati Phir Se"
Each utterance is annotated with humor/non-humor labels and encompasses acoustic, visual, and textual modalities.
The empirical results on M2H2 dataset demonstrate that multimodal information complements unimodal information for humor recognition.
arXiv Detail & Related papers (2021-08-03T02:54:09Z) - Laughing Heads: Can Transformers Detect What Makes a Sentence Funny? [18.67834526946997]
We train and analyze transformer-based humor recognition models on a dataset consisting of minimal pairs of aligned sentences.
We find that, although our aligned dataset is much harder than previous datasets, transformer-based models recognize the humorous sentence in an aligned pair with high accuracy (78%).
Most remarkably, we find clear evidence that one single attention head learns to recognize the words that make a test sentence humorous, even without access to this information at training time.
arXiv Detail & Related papers (2021-05-19T14:02:25Z) - Humor@IITK at SemEval-2021 Task 7: Large Language Models for Quantifying
Humor and Offensiveness [2.251416625953577]
This paper explores whether large neural models and their ensembles can capture the intricacies associated with humor/offense detection and rating.
Our experiments on the SemEval-2021 Task 7: HaHackathon show that we can develop reasonable humor and offense detection systems with such models.
arXiv Detail & Related papers (2021-04-02T08:22:02Z) - Federated Learning with Diversified Preference for Humor Recognition [40.89453484353102]
We propose the FedHumor approach to recognize humorous text contents in a personalized manner through federated learning (FL)
Experiments demonstrate significant advantages of FedHumor in recognizing humor contents accurately for people with diverse humor preferences compared to 9 state-of-the-art humor recognition approaches.
arXiv Detail & Related papers (2020-12-03T03:24:24Z) - Advancing Humor-Focused Sentiment Analysis through Improved
Contextualized Embeddings and Model Architecture [0.0]
Humor allows us to express thoughts and feelings conveniently and effectively.
As language models become ubiquitous through virtual-assistants and IOT devices, the need to develop humor-aware models rises exponentially.
arXiv Detail & Related papers (2020-11-23T22:30:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.