Sarcasm Detection in Twitter -- Performance Impact when using Data
Augmentation: Word Embeddings
- URL: http://arxiv.org/abs/2108.09924v1
- Date: Mon, 23 Aug 2021 04:24:12 GMT
- Title: Sarcasm Detection in Twitter -- Performance Impact when using Data
Augmentation: Word Embeddings
- Authors: Alif Tri Handoyo, Hidayaturrahman, Derwin Suhartono
- Abstract summary: Sarcasm is the use of words usually used to either mock or annoy someone, or for humorous purposes.
We propose a contextual model for sarcasm identification in twitter using RoBERTa and augmenting the dataset.
We achieve performance gain by 3.2% in the iSarcasm dataset when using data augmentation to increase 20% of data labeled as sarcastic.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sarcasm is the use of words usually used to either mock or annoy someone, or
for humorous purposes. Sarcasm is largely used in social networks and
microblogging websites, where people mock or censure in a way that makes it
difficult even for humans to tell if what is said is what is meant. Failure to
identify sarcastic utterances in Natural Language Processing applications such
as sentiment analysis and opinion mining will confuse classification algorithms
and generate false results. Several studies on sarcasm detection have utilized
different learning algorithms. However, most of these learning models have
always focused on the contents of expression only, leaving the contextual
information in isolation. As a result, they failed to capture the contextual
information in the sarcastic expression. Moreover, some datasets used in
several studies have an unbalanced dataset which impacting the model result. In
this paper, we propose a contextual model for sarcasm identification in twitter
using RoBERTa, and augmenting the dataset by applying Global Vector
representation (GloVe) for the construction of word embedding and context
learning to generate more data and balancing the dataset. The effectiveness of
this technique is tested with various datasets and data augmentation settings.
In particular, we achieve performance gain by 3.2% in the iSarcasm dataset when
using data augmentation to increase 20% of data labeled as sarcastic, resulting
F-score of 40.4% compared to 37.2% without data augmentation.
Related papers
- Sarcasm Detection in a Less-Resourced Language [0.0]
We build a sarcasm detection dataset for a less-resourced language, such as Slovenian.
We leverage two modern techniques: a machine translation specific medium-size transformer model, and a very large generative language model.
The results show that larger models generally outperform smaller ones and that ensembling can slightly improve sarcasm detection performance.
arXiv Detail & Related papers (2024-10-16T16:10:59Z) - Generalizable Sarcasm Detection Is Just Around The Corner, Of Course! [3.1245838179647576]
We tested the robustness of sarcasm detection models by examining their behavior when fine-tuned on four sarcasm datasets.
For intra-dataset predictions, models consistently performed better when fine-tuned with third-party labels.
For cross-dataset predictions, most models failed to generalize well to the other datasets.
arXiv Detail & Related papers (2024-04-09T14:48:32Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Sarcasm Detection Framework Using Emotion and Sentiment Features [62.997667081978825]
We propose a model which incorporates emotion and sentiment features to capture the incongruity intrinsic to sarcasm.
Our approach achieved state-of-the-art results on four datasets from social networking platforms and online media.
arXiv Detail & Related papers (2022-11-23T15:14:44Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - UTNLP at SemEval-2022 Task 6: A Comparative Analysis of Sarcasm
Detection using generative-based and mutation-based data augmentation [0.0]
Sarcasm is a term that refers to the use of words to mock, irritate, or amuse someone.
The metaphorical and creative nature of sarcasm presents a significant difficulty for sentiment analysis systems based on affective computing.
We put different models, and data augmentation approaches to the test and report on which one works best.
arXiv Detail & Related papers (2022-04-18T07:25:27Z) - "Did you really mean what you said?" : Sarcasm Detection in
Hindi-English Code-Mixed Data using Bilingual Word Embeddings [0.0]
We present a corpus of tweets for training custom word embeddings and a Hinglish dataset labelled for sarcasm detection.
We propose a deep learning based approach to address the issue of sarcasm detection in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2020-10-01T11:41:44Z) - Trawling for Trolling: A Dataset [56.1778095945542]
We present a dataset that models trolling as a subcategory of offensive content.
The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech.
arXiv Detail & Related papers (2020-08-02T17:23:55Z) - Augmenting Data for Sarcasm Detection with Unlabeled Conversation
Context [55.898436183096614]
We present a novel data augmentation technique, CRA (Contextual Response Augmentation), which utilizes conversational context to generate meaningful samples for training.
Specifically, our proposed model, trained with the proposed data augmentation technique, participated in the sarcasm detection task of FigLang2020, have won and achieves the best performance in both Reddit and Twitter datasets.
arXiv Detail & Related papers (2020-06-11T09:00:11Z) - Sarcasm Detection using Context Separators in Online Discourse [3.655021726150369]
Sarcasm is an intricate form of speech, where meaning is conveyed implicitly.
In this work, we use RoBERTa_large to detect sarcasm in two datasets.
We also assert the importance of context in improving the performance of contextual word embedding models.
arXiv Detail & Related papers (2020-06-01T10:52:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.