Related papers: Sarcasm Detection in Twitter -- Performance Impact when using Data Augmentation: Word Embeddings

Sarcasm Detection in Twitter -- Performance Impact when using Data Augmentation: Word Embeddings

URL: http://arxiv.org/abs/2108.09924v1
Date: Mon, 23 Aug 2021 04:24:12 GMT
Title: Sarcasm Detection in Twitter -- Performance Impact when using Data Augmentation: Word Embeddings
Authors: Alif Tri Handoyo, Hidayaturrahman, Derwin Suhartono
Abstract summary: Sarcasm is the use of words usually used to either mock or annoy someone, or for humorous purposes. We propose a contextual model for sarcasm identification in twitter using RoBERTa and augmenting the dataset. We achieve performance gain by 3.2% in the iSarcasm dataset when using data augmentation to increase 20% of data labeled as sarcastic.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sarcasm is the use of words usually used to either mock or annoy someone, or for humorous purposes. Sarcasm is largely used in social networks and microblogging websites, where people mock or censure in a way that makes it difficult even for humans to tell if what is said is what is meant. Failure to identify sarcastic utterances in Natural Language Processing applications such as sentiment analysis and opinion mining will confuse classification algorithms and generate false results. Several studies on sarcasm detection have utilized different learning algorithms. However, most of these learning models have always focused on the contents of expression only, leaving the contextual information in isolation. As a result, they failed to capture the contextual information in the sarcastic expression. Moreover, some datasets used in several studies have an unbalanced dataset which impacting the model result. In this paper, we propose a contextual model for sarcasm identification in twitter using RoBERTa, and augmenting the dataset by applying Global Vector representation (GloVe) for the construction of word embedding and context learning to generate more data and balancing the dataset. The effectiveness of this technique is tested with various datasets and data augmentation settings. In particular, we achieve performance gain by 3.2% in the iSarcasm dataset when using data augmentation to increase 20% of data labeled as sarcastic, resulting F-score of 40.4% compared to 37.2% without data augmentation.

Related papers

Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection [16.35106164874197]
Sarcasm fundamentally alters meaning through tone and context, yet detecting it in speech remains a challenge due to data scarcity.<n>We propose an annotation pipeline that leverages large language models (LLMs) to generate a sarcasm dataset.<n>We validate this approach by comparing annotation quality and detection performance on a publicly available sarcasm dataset.<n>Finally, we introduce PodSarc, a large-scale sarcastic speech dataset created through this pipeline.
arXiv Detail & Related papers (2025-06-01T11:00:18Z)
On the Impact of Language Nuances on Sentiment Analysis with Large Language Models: Paraphrasing, Sarcasm, and Emojis [0.3774866290142281]
Large Language Models (LLMs) have demonstrated impressive performance across various tasks, including sentiment analysis. This research explores how textual nuances, including emojis and sarcasm, affect sentiment analysis.
arXiv Detail & Related papers (2025-04-08T01:29:58Z)
Sarcasm Detection in a Less-Resourced Language [0.0]
We build a sarcasm detection dataset for a less-resourced language, such as Slovenian. We leverage two modern techniques: a machine translation specific medium-size transformer model, and a very large generative language model. The results show that larger models generally outperform smaller ones and that ensembling can slightly improve sarcasm detection performance.
arXiv Detail & Related papers (2024-10-16T16:10:59Z)
Generalizable Sarcasm Detection Is Just Around The Corner, Of Course! [3.1245838179647576]
We tested the robustness of sarcasm detection models by examining their behavior when fine-tuned on four sarcasm datasets. For intra-dataset predictions, models consistently performed better when fine-tuned with third-party labels. For cross-dataset predictions, most models failed to generalize well to the other datasets.
arXiv Detail & Related papers (2024-04-09T14:48:32Z)
Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B. We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively. We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z)
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification. Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z)
Sarcasm Detection Framework Using Emotion and Sentiment Features [62.997667081978825]
We propose a model which incorporates emotion and sentiment features to capture the incongruity intrinsic to sarcasm. Our approach achieved state-of-the-art results on four datasets from social networking platforms and online media.
arXiv Detail & Related papers (2022-11-23T15:14:44Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
UTNLP at SemEval-2022 Task 6: A Comparative Analysis of Sarcasm Detection using generative-based and mutation-based data augmentation [0.0]
Sarcasm is a term that refers to the use of words to mock, irritate, or amuse someone. The metaphorical and creative nature of sarcasm presents a significant difficulty for sentiment analysis systems based on affective computing. We put different models, and data augmentation approaches to the test and report on which one works best.
arXiv Detail & Related papers (2022-04-18T07:25:27Z)
"Did you really mean what you said?" : Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings [0.0]
We present a corpus of tweets for training custom word embeddings and a Hinglish dataset labelled for sarcasm detection. We propose a deep learning based approach to address the issue of sarcasm detection in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2020-10-01T11:41:44Z)
Trawling for Trolling: A Dataset [56.1778095945542]
We present a dataset that models trolling as a subcategory of offensive content. The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech.
arXiv Detail & Related papers (2020-08-02T17:23:55Z)
Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context [55.898436183096614]
We present a novel data augmentation technique, CRA (Contextual Response Augmentation), which utilizes conversational context to generate meaningful samples for training. Specifically, our proposed model, trained with the proposed data augmentation technique, participated in the sarcasm detection task of FigLang2020, have won and achieves the best performance in both Reddit and Twitter datasets.
arXiv Detail & Related papers (2020-06-11T09:00:11Z)
Sarcasm Detection using Context Separators in Online Discourse [3.655021726150369]
Sarcasm is an intricate form of speech, where meaning is conveyed implicitly. In this work, we use RoBERTa_large to detect sarcasm in two datasets. We also assert the importance of context in improving the performance of contextual word embedding models.
arXiv Detail & Related papers (2020-06-01T10:52:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.