Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?
- URL: http://arxiv.org/abs/2105.09142v1
- Date: Wed, 19 May 2021 14:02:25 GMT
- Title: Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?
- Authors: Maxime Peyrard, Beatriz Borges, Kristina Gligori\'c and Robert West
- Abstract summary: We train and analyze transformer-based humor recognition models on a dataset consisting of minimal pairs of aligned sentences.
We find that, although our aligned dataset is much harder than previous datasets, transformer-based models recognize the humorous sentence in an aligned pair with high accuracy (78%).
Most remarkably, we find clear evidence that one single attention head learns to recognize the words that make a test sentence humorous, even without access to this information at training time.
- Score: 18.67834526946997
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The automatic detection of humor poses a grand challenge for natural language
processing. Transformer-based systems have recently achieved remarkable results
on this task, but they usually (1)~were evaluated in setups where serious vs
humorous texts came from entirely different sources, and (2)~focused on
benchmarking performance without providing insights into how the models work.
We make progress in both respects by training and analyzing transformer-based
humor recognition models on a recently introduced dataset consisting of minimal
pairs of aligned sentences, one serious, the other humorous. We find that,
although our aligned dataset is much harder than previous datasets,
transformer-based models recognize the humorous sentence in an aligned pair
with high accuracy (78%). In a careful error analysis, we characterize easy vs
hard instances. Finally, by analyzing attention weights, we obtain important
insights into the mechanisms by which transformers recognize humor. Most
remarkably, we find clear evidence that one single attention head learns to
recognize the words that make a test sentence humorous, even without access to
this information at training time.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor [8.75275650545552]
HumorDB is an image-only dataset specifically designed to advance visual humor understanding.
The dataset enables evaluation through binary classification, range regression, and pairwise comparison tasks.
HumorDB shows potential as a valuable benchmark for powerful large multimodal models.
arXiv Detail & Related papers (2024-06-19T13:51:40Z) - Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models [27.936545041302377]
Large language models (LLMs) can generate synthetic data for humor detection via editing texts.
We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes.
We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators.
arXiv Detail & Related papers (2024-02-23T02:58:12Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results [84.37263300062597]
Humor is a substantial element of human social behavior, affect, and cognition.
Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications.
We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor dataset, comprising about 11 hours of recordings.
arXiv Detail & Related papers (2022-09-28T17:36:47Z) - Dutch Humor Detection by Generating Negative Examples [5.888646114353371]
Humor detection is usually modeled as a binary classification task, trained to predict if the given text is a joke or another type of text.
We propose using text generation algorithms for imitating the original joke dataset to increase the difficulty for the learning algorithm.
We compare the humor detection capabilities of classic neural network approaches with the state-of-the-art Dutch language model RobBERT.
arXiv Detail & Related papers (2020-10-26T15:15:10Z) - ColBERT: Using BERT Sentence Embedding in Parallel Neural Networks for
Computational Humor [0.0]
We propose a novel approach for detecting and rating humor in short texts based on a popular linguistic theory of humor.
The proposed technical method initiates by separating sentences of the given text and utilizing the BERT model to generate embeddings for each one.
We accompany the paper with a novel dataset for humor detection consisting of 200,000 formal short texts.
The proposed model obtained F1 scores of 0.982 and 0.869 in the humor detection experiments which outperform general and state-of-the-art models.
arXiv Detail & Related papers (2020-04-27T13:10:11Z) - Fixed Encoder Self-Attention Patterns in Transformer-Based Machine
Translation [73.11214377092121]
We propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns.
Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality.
arXiv Detail & Related papers (2020-02-24T13:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.