KoCoSa: Korean Context-aware Sarcasm Detection Dataset
- URL: http://arxiv.org/abs/2402.14428v2
- Date: Fri, 22 Mar 2024 06:29:26 GMT
- Title: KoCoSa: Korean Context-aware Sarcasm Detection Dataset
- Authors: Yumin Kim, Heejae Suh, Mingi Kim, Dongyeon Won, Hwanhee Lee,
- Abstract summary: Sarcasm is a way of verbal irony where someone says the opposite of what they mean, often to ridicule a person, situation, or idea.
In this paper, we introduce a new dataset for the Korean dialogue sarcasm detection task, KoCoSa.
The dataset consists of 12.8K daily Korean dialogues and the labels for this task on the last response.
- Score: 3.369750569233713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sarcasm is a way of verbal irony where someone says the opposite of what they mean, often to ridicule a person, situation, or idea. It is often difficult to detect sarcasm in the dialogue since detecting sarcasm should reflect the context (i.e., dialogue history). In this paper, we introduce a new dataset for the Korean dialogue sarcasm detection task, KoCoSa (Korean Context-aware Sarcasm Detection Dataset), which consists of 12.8K daily Korean dialogues and the labels for this task on the last response. To build the dataset, we propose an efficient sarcasm detection dataset generation pipeline: 1) generating new sarcastic dialogues from source dialogues with large language models, 2) automatic and manual filtering of abnormal and toxic dialogues, and 3) human annotation for the sarcasm detection task. We also provide a simple but effective baseline for the Korean sarcasm detection task trained on our dataset. Experimental results on the dataset show that our baseline system outperforms strong baselines like large language models, such as GPT-3.5, in the Korean sarcasm detection task. We show that the sarcasm detection task relies deeply on the existence of sufficient context. We will release the dataset at https://github.com/Yu-billie/KoCoSa_sarcasm_detection.
Related papers
- Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue [67.09698638709065]
We propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE.
In particular, we first propose a lexicon-guided utterance sentiment inference module, where a utterance sentiment refinement strategy is devised.
We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip.
arXiv Detail & Related papers (2024-02-06T03:14:46Z) - Sarcasm Detection in a Disaster Context [103.93691731605163]
We introduce HurricaneSARC, a dataset of 15,000 tweets annotated for intended sarcasm.
Our best model is able to obtain as much as 0.70 F1 on our dataset.
arXiv Detail & Related papers (2023-08-16T05:58:12Z) - Researchers eye-view of sarcasm detection in social media textual
content [0.0]
Enormous use of sarcastic text in all forms of communication in social media will have a physiological effect on target users.
This paper discusses various sarcasm detection techniques and concludes with some approaches, related datasets with optimal features.
arXiv Detail & Related papers (2023-04-17T19:45:10Z) - Sarcasm Detection Framework Using Emotion and Sentiment Features [62.997667081978825]
We propose a model which incorporates emotion and sentiment features to capture the incongruity intrinsic to sarcasm.
Our approach achieved state-of-the-art results on four datasets from social networking platforms and online media.
arXiv Detail & Related papers (2022-11-23T15:14:44Z) - How to Describe Images in a More Funny Way? Towards a Modular Approach
to Cross-Modal Sarcasm Generation [62.89586083449108]
We study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image.
CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities.
We propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation.
arXiv Detail & Related papers (2022-11-20T14:38:24Z) - Computational Sarcasm Analysis on Social Media: A Systematic Review [0.23488056916440855]
Sarcasm can be defined as saying or writing the opposite of what one truly wants to express, usually to insult, irritate, or amuse someone.
Because of the obscure nature of sarcasm in textual data, detecting it is difficult and of great interest to the sentiment analysis research community.
arXiv Detail & Related papers (2022-09-13T17:20:19Z) - sarcasm detection and quantification in arabic tweets [7.173484352846755]
This paper intends to create a new humanly annotated Arabic corpus for sarcasm detection collected from tweets.
The proposed approach tackles the problem as a regression problem instead of classification.
arXiv Detail & Related papers (2021-08-03T11:48:27Z) - Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture Text
and English Humor Literature [0.76146285961466]
We manually extract the sarcastic word distribution features of a benchmark pop culture sarcasm corpus.
We generate input sequences formed of the weighted vectors from such words.
Our proposed model for detecting sarcasm peaks a training accuracy of 98.95% when trained with the discussed dataset.
arXiv Detail & Related papers (2021-06-10T14:01:07Z) - Augmenting Data for Sarcasm Detection with Unlabeled Conversation
Context [55.898436183096614]
We present a novel data augmentation technique, CRA (Contextual Response Augmentation), which utilizes conversational context to generate meaningful samples for training.
Specifically, our proposed model, trained with the proposed data augmentation technique, participated in the sarcasm detection task of FigLang2020, have won and achieves the best performance in both Reddit and Twitter datasets.
arXiv Detail & Related papers (2020-06-11T09:00:11Z) - Sarcasm Detection using Context Separators in Online Discourse [3.655021726150369]
Sarcasm is an intricate form of speech, where meaning is conveyed implicitly.
In this work, we use RoBERTa_large to detect sarcasm in two datasets.
We also assert the importance of context in improving the performance of contextual word embedding models.
arXiv Detail & Related papers (2020-06-01T10:52:35Z) - $R^3$: Reverse, Retrieve, and Rank for Sarcasm Generation with
Commonsense Knowledge [51.70688120849654]
We propose an unsupervised approach for sarcasm generation based on a non-sarcastic input sentence.
Our method employs a retrieve-and-edit framework to instantiate two major characteristics of sarcasm.
arXiv Detail & Related papers (2020-04-28T02:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.