Related papers: Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes

Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes

URL: http://arxiv.org/abs/2507.13335v1
Date: Thu, 17 Jul 2025 17:51:20 GMT
Title: Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes
Authors: Tyler Loakman, William Thorne, Chenghua Lin,
Abstract summary: We investigate whether Large Language Models (LLMs) to explain humour depends on the particular humour form.<n>We compare models on simple puns and more complex topical humour that requires knowledge of real-world entities and events.<n>We find that none of the tested models are capable of reliably generating adequate explanations of all joke types.
Score: 14.762724547600447
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humour, as a complex language form, is derived from myriad aspects of life, whilst existing work on computational humour has focussed almost exclusively on short pun-based jokes. In this work, we investigate whether the ability of Large Language Models (LLMs) to explain humour depends on the particular humour form. We compare models on simple puns and more complex topical humour that requires knowledge of real-world entities and events. In doing so, we curate a dataset of 600 jokes split across 4 joke types and manually write high-quality explanations. These jokes include heterographic and homographic puns, contemporary internet humour, and topical jokes, where understanding relies on reasoning beyond "common sense", rooted instead in world knowledge regarding news events and pop culture. Using this dataset, we compare the zero-shot abilities of a range of LLMs to accurately and comprehensively explain jokes of different types, identifying key research gaps in the task of humour explanation. We find that none of the tested models (inc. reasoning models) are capable of reliably generating adequate explanations of all joke types, further highlighting the narrow focus of most works in computational humour on overly simple joke forms.

Related papers

From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy [6.124881326867511]
In light of the widespread adoption of Large Language Models, the intersection of humor and AI has become no laughing matter.<n>In this study, we assess the ability of models in accurately identifying humorous quotes from a stand-up comedy transcript.<n>We propose a novel humor detection metric designed to evaluate LLMs amongst various prompts on their capability to extract humorous punchlines.
arXiv Detail & Related papers (2025-04-12T02:19:53Z)
Can Pre-trained Language Models Understand Chinese Humor? [74.96509580592004]
This paper is the first work that systematically investigates the humor understanding ability of pre-trained language models (PLMs) We construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework. Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation.
arXiv Detail & Related papers (2024-07-04T18:13:38Z)
Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models [27.936545041302377]
Large language models (LLMs) can generate synthetic data for humor detection via editing texts. We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes. We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators.
arXiv Detail & Related papers (2024-02-23T02:58:12Z)
ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models [19.399535453449488]
OpenAI's ChatGPT model almost seems to communicate on a human level and can even tell jokes. In a series of exploratory experiments around jokes, i.e., generation, explanation, and detection, we seek to understand ChatGPT's capability to grasp and reproduce human humor. Our empirical evidence indicates that jokes are not hard-coded but mostly also not newly generated by the model.
arXiv Detail & Related papers (2023-06-07T16:10:21Z)
The Naughtyformer: A Transformer Understands Offensive Humor [63.05016513788047]
We introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer. We show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-11-25T20:37:58Z)
ExPUNations: Augmenting Puns with Keywords and Explanations [88.58174386894913]
We augment an existing dataset of puns with detailed crowdsourced annotations of keywords. This is the first humor dataset with such extensive and fine-grained annotations specifically for puns. We propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation.
arXiv Detail & Related papers (2022-10-24T18:12:02Z)
Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results [84.37263300062597]
Humor is a substantial element of human social behavior, affect, and cognition. Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor dataset, comprising about 11 hours of recordings.
arXiv Detail & Related papers (2022-09-28T17:36:47Z)
Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest [70.40189243067857]
Large neural networks can now generate jokes, but do they really "understand" humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest. We find that both types of models struggle at all three tasks.
arXiv Detail & Related papers (2022-09-13T20:54:00Z)
Uncertainty and Surprisal Jointly Deliver the Punchline: Exploiting Incongruity-Based Features for Humor Recognition [0.6445605125467573]
We break down any joke into two distinct components: the set-up and the punchline. Inspired by the incongruity theory of humor, we model the set-up as the part developing semantic uncertainty. With increasingly powerful language models, we were able to feed the set-up along with the punchline into the GPT-2 language model.
arXiv Detail & Related papers (2020-12-22T13:48:09Z)
"The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition [80.59427655743092]
We propose Pronunciation-attentive Contextualized Pun Recognition (PCPR) to perceive human humor. PCPR derives contextualized representation for each word in a sentence by capturing the association between the surrounding context and its corresponding phonetic symbols. Results demonstrate that the proposed approach significantly outperforms the state-of-the-art methods in pun detection and location tasks.
arXiv Detail & Related papers (2020-04-29T20:12:20Z)
Let's be Humorous: Knowledge Enhanced Humor Generation [26.886255899651893]
We explore how to generate a punchline given the set-up with the relevant knowledge. To our knowledge, this is the first attempt to generate punchlines with knowledge enhanced model. The experimental results demonstrate that our method can make use of knowledge to generate fluent, funny punchlines.
arXiv Detail & Related papers (2020-04-28T06:06:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.