Related papers: "A good pun is its own reword": Can Large Language Models Understand Puns?

"A good pun is its own reword": Can Large Language Models Understand Puns?

URL: http://arxiv.org/abs/2404.13599v2
Date: Sun, 16 Jun 2024 11:31:18 GMT
Title: "A good pun is its own reword": Can Large Language Models Understand Puns?
Authors: Zhijun Xu, Siyu Yuan, Lingjie Chen, Deqing Yang,
Abstract summary: Puns play a vital role in academic research due to their distinct structure and clear definition. The understanding of puns in large language models (LLMs) has not been thoroughly examined.
Score: 9.541689402830642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Puns play a vital role in academic research due to their distinct structure and clear definition, which aid in the comprehensive analysis of linguistic humor. However, the understanding of puns in large language models (LLMs) has not been thoroughly examined, limiting their use in creative writing and humor creation. In this paper, we leverage three popular tasks, i.e., pun recognition, explanation and generation to systematically evaluate the capabilities of LLMs in pun understanding. In addition to adopting the automated evaluation metrics from prior research, we introduce new evaluation methods and metrics that are better suited to the in-context learning paradigm of LLMs. These new metrics offer a more rigorous assessment of an LLM's ability to understand puns and align more closely with human cognition than previous metrics. Our findings reveal the "lazy pun generation" pattern and identify the primary challenges LLMs encounter in understanding puns.

Related papers

From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy [6.124881326867511]
In light of the widespread adoption of Large Language Models, the intersection of humor and AI has become no laughing matter. In this study, we assess the ability of models in accurately identifying humorous quotes from a stand-up comedy transcript. We propose a novel humor detection metric designed to evaluate LLMs amongst various prompts on their capability to extract humorous punchlines.
arXiv Detail & Related papers (2025-04-12T02:19:53Z)
A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs [1.9073729452914245]
We introduce a novel proverb database consisting of 300 proverb pairs that are similar in intent but different in wording. We propose tests to evaluate textual consistencies as well as numerical consistencies across similar proverbs.
arXiv Detail & Related papers (2024-10-22T02:38:48Z)
Can Pre-trained Language Models Understand Chinese Humor? [74.96509580592004]
This paper is the first work that systematically investigates the humor understanding ability of pre-trained language models (PLMs) We construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework. Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation.
arXiv Detail & Related papers (2024-07-04T18:13:38Z)
Can large language models understand uncommon meanings of common words? [30.527834781076546]
Large language models (LLMs) have shown significant advancements across diverse natural language understanding (NLU) tasks. Yet, lacking widely acknowledged testing mechanisms, answering whether LLMs are parrots or genuinely comprehend the world' remains unclear. This paper presents innovative construction of a Lexical Semantic dataset with novel evaluation metrics.
arXiv Detail & Related papers (2024-05-09T12:58:22Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years. Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear. In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z)
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs [26.118193748582197]
We evaluate four categories of widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference, models in three of these categories perform close to random. These results suggest that certain fine-tuning strategies are far better at inducing pragmatic understanding in models.
arXiv Detail & Related papers (2022-10-26T19:04:23Z)
ExPUNations: Augmenting Puns with Keywords and Explanations [88.58174386894913]
We augment an existing dataset of puns with detailed crowdsourced annotations of keywords. This is the first humor dataset with such extensive and fine-grained annotations specifically for puns. We propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation.
arXiv Detail & Related papers (2022-10-24T18:12:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.