One Joke to Rule them All? On the (Im)possibility of Generalizing Humor
- URL: http://arxiv.org/abs/2508.19402v1
- Date: Tue, 26 Aug 2025 19:55:40 GMT
- Title: One Joke to Rule them All? On the (Im)possibility of Generalizing Humor
- Authors: Mor Turgeman, Chen Shani, Dafna Shahaf,
- Abstract summary: Large Language Models (LLMs) must be able to generalize across humor types.<n>Experiments reveal models are capable of some transfer, and can reach up to 75% accuracy on unseen datasets.<n>Further analysis suggests relations between humor types, with Dad Jokes surprisingly emerging as the best enabler of transfer.
- Score: 11.819634993950544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humor is a broad and complex form of communication that remains challenging for machines. Despite its broadness, most existing research on computational humor traditionally focused on modeling a specific type of humor. In this work, we wish to understand whether competence on one or more specific humor tasks confers any ability to transfer to novel, unseen types; in other words, is this fragmentation inevitable? This question is especially timely as new humor types continuously emerge in online and social media contexts (e.g., memes, anti-humor, AI fails). If Large Language Models (LLMs) are to keep up with this evolving landscape, they must be able to generalize across humor types by capturing deeper, transferable mechanisms. To investigate this, we conduct a series of transfer learning experiments across four datasets, representing different humor tasks. We train LLMs under varied diversity settings (1-3 datasets in training, testing on a novel task). Experiments reveal that models are capable of some transfer, and can reach up to 75% accuracy on unseen datasets; training on diverse sources improves transferability (1.88-4.05%) with minimal-to-no drop in in-domain performance. Further analysis suggests relations between humor types, with Dad Jokes surprisingly emerging as the best enabler of transfer (but is difficult to transfer to). We release data and code.
Related papers
- Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models [55.98686105081078]
Large language models are increasingly used for creative writing and engagement content, raising safety concerns about the outputs.<n>This work evaluates how funniness optimization in modern LLM pipelines couples with harmful content by measuring humor, stereotypicality, and toxicity.
arXiv Detail & Related papers (2025-10-21T09:28:09Z) - Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes [14.762724547600447]
We investigate whether Large Language Models (LLMs) to explain humour depends on the particular humour form.<n>We compare models on simple puns and more complex topical humour that requires knowledge of real-world entities and events.<n>We find that none of the tested models are capable of reliably generating adequate explanations of all joke types.
arXiv Detail & Related papers (2025-07-17T17:51:20Z) - From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy [6.124881326867511]
In light of the widespread adoption of Large Language Models, the intersection of humor and AI has become no laughing matter.<n>In this study, we assess the ability of models in accurately identifying humorous quotes from a stand-up comedy transcript.<n>We propose a novel humor detection metric designed to evaluate LLMs amongst various prompts on their capability to extract humorous punchlines.
arXiv Detail & Related papers (2025-04-12T02:19:53Z) - THInC: A Theory-Driven Framework for Computational Humor Detection [2.0960189135529212]
There is still no agreement on a single, comprehensive humor theory.
Most computational approaches to detecting humor are not based on existing humor theories.
This paper contributes to bridging this long-standing gap by creating an interpretable framework for humor classification.
arXiv Detail & Related papers (2024-09-02T13:09:26Z) - Can Pre-trained Language Models Understand Chinese Humor? [74.96509580592004]
This paper is the first work that systematically investigates the humor understanding ability of pre-trained language models (PLMs)
We construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework.
Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation.
arXiv Detail & Related papers (2024-07-04T18:13:38Z) - Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models [27.936545041302377]
Large language models (LLMs) can generate synthetic data for humor detection via editing texts.
We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes.
We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators.
arXiv Detail & Related papers (2024-02-23T02:58:12Z) - The Naughtyformer: A Transformer Understands Offensive Humor [63.05016513788047]
We introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer.
We show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-11-25T20:37:58Z) - ExPUNations: Augmenting Puns with Keywords and Explanations [88.58174386894913]
We augment an existing dataset of puns with detailed crowdsourced annotations of keywords.
This is the first humor dataset with such extensive and fine-grained annotations specifically for puns.
We propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation.
arXiv Detail & Related papers (2022-10-24T18:12:02Z) - Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results [84.37263300062597]
Humor is a substantial element of human social behavior, affect, and cognition.
Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications.
We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor dataset, comprising about 11 hours of recordings.
arXiv Detail & Related papers (2022-09-28T17:36:47Z) - Multimodal Learning using Optimal Transport for Sarcasm and Humor
Detection [76.62550719834722]
We deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs.
We propose a novel multimodal learning system, MuLOT, which utilizes self-attention to exploit intra-modal correspondence.
We test our approach for multimodal sarcasm and humor detection on three benchmark datasets.
arXiv Detail & Related papers (2021-10-21T07:51:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.