Augmenting Data for Sarcasm Detection with Unlabeled Conversation
Context
- URL: http://arxiv.org/abs/2006.06259v1
- Date: Thu, 11 Jun 2020 09:00:11 GMT
- Title: Augmenting Data for Sarcasm Detection with Unlabeled Conversation
Context
- Authors: Hankyol Lee, Youngjae Yu, Gunhee Kim
- Abstract summary: We present a novel data augmentation technique, CRA (Contextual Response Augmentation), which utilizes conversational context to generate meaningful samples for training.
Specifically, our proposed model, trained with the proposed data augmentation technique, participated in the sarcasm detection task of FigLang2020, have won and achieves the best performance in both Reddit and Twitter datasets.
- Score: 55.898436183096614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel data augmentation technique, CRA (Contextual Response
Augmentation), which utilizes conversational context to generate meaningful
samples for training. We also mitigate the issues regarding unbalanced context
lengths by changing the input-output format of the model such that it can deal
with varying context lengths effectively. Specifically, our proposed model,
trained with the proposed data augmentation technique, participated in the
sarcasm detection task of FigLang2020, have won and achieves the best
performance in both Reddit and Twitter datasets.
Related papers
- Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - Learning towards Selective Data Augmentation for Dialogue Generation [52.540330534137794]
We argue that not all cases are beneficial for augmentation task, and the cases suitable for augmentation should obey the following two attributes.
We propose a Selective Data Augmentation framework (SDA) for the response generation task.
arXiv Detail & Related papers (2023-03-17T01:26:39Z) - WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data
augmentation in tExt Regression Tasks [4.102007186133394]
In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER.
We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data.
arXiv Detail & Related papers (2023-03-05T19:45:42Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - Sarcasm Detection in Twitter -- Performance Impact when using Data
Augmentation: Word Embeddings [0.0]
Sarcasm is the use of words usually used to either mock or annoy someone, or for humorous purposes.
We propose a contextual model for sarcasm identification in twitter using RoBERTa and augmenting the dataset.
We achieve performance gain by 3.2% in the iSarcasm dataset when using data augmentation to increase 20% of data labeled as sarcastic.
arXiv Detail & Related papers (2021-08-23T04:24:12Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning.
We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z) - Sarcasm Detection using Context Separators in Online Discourse [3.655021726150369]
Sarcasm is an intricate form of speech, where meaning is conveyed implicitly.
In this work, we use RoBERTa_large to detect sarcasm in two datasets.
We also assert the importance of context in improving the performance of contextual word embedding models.
arXiv Detail & Related papers (2020-06-01T10:52:35Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.