How Effective is Incongruity? Implications for Code-mix Sarcasm
Detection
- URL: http://arxiv.org/abs/2202.02702v1
- Date: Sun, 6 Feb 2022 04:05:09 GMT
- Title: How Effective is Incongruity? Implications for Code-mix Sarcasm
Detection
- Authors: Aditya Shah, Chandresh Kumar Maurya
- Abstract summary: sarcasm poses several challenges for downstream NLP tasks.
We propose the idea of capturing incongruity through sub-word level embeddings learned via fastText.
Our proposed model achieves F1-score on code-mix Hinglish dataset comparable to pretrained multilingual models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The presence of sarcasm in conversational systems and social media like
chatbots, Facebook, Twitter, etc. poses several challenges for downstream NLP
tasks. This is attributed to the fact that the intended meaning of a sarcastic
text is contrary to what is expressed. Further, the use of code-mix language to
express sarcasm is increasing day by day. Current NLP techniques for code-mix
data have limited success due to the use of different lexicon, syntax, and
scarcity of labeled corpora. To solve the joint problem of code-mixing and
sarcasm detection, we propose the idea of capturing incongruity through
sub-word level embeddings learned via fastText. Empirical results shows that
our proposed model achieves F1-score on code-mix Hinglish dataset comparable to
pretrained multilingual models while training 10x faster and using a lower
memory footprint
Related papers
- Leveraging Language Identification to Enhance Code-Mixed Text
Classification [0.7340017786387767]
Existing deep-learning models do not take advantage of the implicit language information in code-mixed text.
Our study aims to improve BERT-based models performance on low-resource Code-Mixed Hindi-English datasets.
arXiv Detail & Related papers (2023-06-08T06:43:10Z) - Precise Zero-Shot Dense Retrieval without Relevance Labels [60.457378374671656]
Hypothetical Document Embeddings(HyDE) is a zero-shot dense retrieval system.
We show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever.
arXiv Detail & Related papers (2022-12-20T18:09:52Z) - How to Describe Images in a More Funny Way? Towards a Modular Approach
to Cross-Modal Sarcasm Generation [62.89586083449108]
We study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image.
CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities.
We propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation.
arXiv Detail & Related papers (2022-11-20T14:38:24Z) - Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture Text
and English Humor Literature [0.76146285961466]
We manually extract the sarcastic word distribution features of a benchmark pop culture sarcasm corpus.
We generate input sequences formed of the weighted vectors from such words.
Our proposed model for detecting sarcasm peaks a training accuracy of 98.95% when trained with the discussed dataset.
arXiv Detail & Related papers (2021-06-10T14:01:07Z) - Interpretable Multi-Head Self-Attention model for Sarcasm Detection in
social media [0.0]
Inherent ambiguity in sarcastic expressions, make sarcasm detection very difficult.
We develop an interpretable deep learning model using multi-head self-attention and gated recurrent units.
We show the effectiveness of our approach by achieving state-of-the-art results on multiple datasets.
arXiv Detail & Related papers (2021-01-14T21:39:35Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy.
It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data.
We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z) - "Did you really mean what you said?" : Sarcasm Detection in
Hindi-English Code-Mixed Data using Bilingual Word Embeddings [0.0]
We present a corpus of tweets for training custom word embeddings and a Hinglish dataset labelled for sarcasm detection.
We propose a deep learning based approach to address the issue of sarcasm detection in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2020-10-01T11:41:44Z) - Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics.
We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z) - Sarcasm Detection using Context Separators in Online Discourse [3.655021726150369]
Sarcasm is an intricate form of speech, where meaning is conveyed implicitly.
In this work, we use RoBERTa_large to detect sarcasm in two datasets.
We also assert the importance of context in improving the performance of contextual word embedding models.
arXiv Detail & Related papers (2020-06-01T10:52:35Z) - MixText: Linguistically-Informed Interpolation of Hidden Space for
Semi-Supervised Text Classification [68.15015032551214]
MixText is a semi-supervised learning method for text classification.
TMix creates a large amount of augmented training samples by interpolating text in hidden space.
We leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data.
arXiv Detail & Related papers (2020-04-25T21:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.