Dutch Humor Detection by Generating Negative Examples
- URL: http://arxiv.org/abs/2010.13652v1
- Date: Mon, 26 Oct 2020 15:15:10 GMT
- Title: Dutch Humor Detection by Generating Negative Examples
- Authors: Thomas Winters, Pieter Delobelle
- Abstract summary: Humor detection is usually modeled as a binary classification task, trained to predict if the given text is a joke or another type of text.
We propose using text generation algorithms for imitating the original joke dataset to increase the difficulty for the learning algorithm.
We compare the humor detection capabilities of classic neural network approaches with the state-of-the-art Dutch language model RobBERT.
- Score: 5.888646114353371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting if a text is humorous is a hard task to do computationally, as it
usually requires linguistic and common sense insights. In machine learning,
humor detection is usually modeled as a binary classification task, trained to
predict if the given text is a joke or another type of text. Rather than using
completely different non-humorous texts, we propose using text generation
algorithms for imitating the original joke dataset to increase the difficulty
for the learning algorithm. We constructed several different joke and non-joke
datasets to test the humor detection abilities of different language
technologies. In particular, we compare the humor detection capabilities of
classic neural network approaches with the state-of-the-art Dutch language
model RobBERT. In doing so, we create and compare the first Dutch humor
detection systems. We found that while other language models perform well when
the non-jokes came from completely different domains, RobBERT was the only one
that was able to distinguish jokes from generated negative examples. This
performance illustrates the usefulness of using text generation to create
negative datasets for humor recognition, and also shows that transformer models
are a large step forward in humor detection.
Related papers
- Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models [27.936545041302377]
Large language models (LLMs) can generate synthetic data for humor detection via editing texts.
We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes.
We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators.
arXiv Detail & Related papers (2024-02-23T02:58:12Z) - Generating Enhanced Negatives for Training Language-Based Object Detectors [86.1914216335631]
We propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data.
Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images.
Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks.
arXiv Detail & Related papers (2023-12-29T23:04:00Z) - The Naughtyformer: A Transformer Understands Offensive Humor [63.05016513788047]
We introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer.
We show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-11-25T20:37:58Z) - GENIUS: Sketch-based Language Model Pre-training via Extreme and
Selective Masking for Text Generation and Augmentation [76.7772833556714]
We introduce GENIUS: a conditional text generation model using sketches as input.
GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective.
We show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks.
arXiv Detail & Related papers (2022-11-18T16:39:45Z) - ExPUNations: Augmenting Puns with Keywords and Explanations [88.58174386894913]
We augment an existing dataset of puns with detailed crowdsourced annotations of keywords.
This is the first humor dataset with such extensive and fine-grained annotations specifically for puns.
We propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation.
arXiv Detail & Related papers (2022-10-24T18:12:02Z) - Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results [84.37263300062597]
Humor is a substantial element of human social behavior, affect, and cognition.
Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications.
We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor dataset, comprising about 11 hours of recordings.
arXiv Detail & Related papers (2022-09-28T17:36:47Z) - On Decoding Strategies for Neural Text Generators [73.48162198041884]
We study the interaction between language generation tasks and decoding strategies.
We measure changes in attributes of generated text as a function of both decoding strategy and task.
Our results reveal both previously-observed and surprising findings.
arXiv Detail & Related papers (2022-03-29T16:25:30Z) - Humor@IITK at SemEval-2021 Task 7: Large Language Models for Quantifying
Humor and Offensiveness [2.251416625953577]
This paper explores whether large neural models and their ensembles can capture the intricacies associated with humor/offense detection and rating.
Our experiments on the SemEval-2021 Task 7: HaHackathon show that we can develop reasonable humor and offense detection systems with such models.
arXiv Detail & Related papers (2021-04-02T08:22:02Z) - Uncertainty and Surprisal Jointly Deliver the Punchline: Exploiting
Incongruity-Based Features for Humor Recognition [0.6445605125467573]
We break down any joke into two distinct components: the set-up and the punchline.
Inspired by the incongruity theory of humor, we model the set-up as the part developing semantic uncertainty.
With increasingly powerful language models, we were able to feed the set-up along with the punchline into the GPT-2 language model.
arXiv Detail & Related papers (2020-12-22T13:48:09Z) - Let's be Humorous: Knowledge Enhanced Humor Generation [26.886255899651893]
We explore how to generate a punchline given the set-up with the relevant knowledge.
To our knowledge, this is the first attempt to generate punchlines with knowledge enhanced model.
The experimental results demonstrate that our method can make use of knowledge to generate fluent, funny punchlines.
arXiv Detail & Related papers (2020-04-28T06:06:18Z) - ColBERT: Using BERT Sentence Embedding in Parallel Neural Networks for
Computational Humor [0.0]
We propose a novel approach for detecting and rating humor in short texts based on a popular linguistic theory of humor.
The proposed technical method initiates by separating sentences of the given text and utilizing the BERT model to generate embeddings for each one.
We accompany the paper with a novel dataset for humor detection consisting of 200,000 formal short texts.
The proposed model obtained F1 scores of 0.982 and 0.869 in the humor detection experiments which outperform general and state-of-the-art models.
arXiv Detail & Related papers (2020-04-27T13:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.