"Did you really mean what you said?" : Sarcasm Detection in
Hindi-English Code-Mixed Data using Bilingual Word Embeddings
- URL: http://arxiv.org/abs/2010.00310v3
- Date: Thu, 15 Oct 2020 08:32:09 GMT
- Title: "Did you really mean what you said?" : Sarcasm Detection in
Hindi-English Code-Mixed Data using Bilingual Word Embeddings
- Authors: Akshita Aggarwal, Anshul Wadhawan, Anshima Chaudhary, Kavita Maurya
- Abstract summary: We present a corpus of tweets for training custom word embeddings and a Hinglish dataset labelled for sarcasm detection.
We propose a deep learning based approach to address the issue of sarcasm detection in Hindi-English code mixed tweets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increased use of social media platforms by people across the world,
many new interesting NLP problems have come into existence. One such being the
detection of sarcasm in the social media texts. We present a corpus of tweets
for training custom word embeddings and a Hinglish dataset labelled for sarcasm
detection. We propose a deep learning based approach to address the issue of
sarcasm detection in Hindi-English code mixed tweets using bilingual word
embeddings derived from FastText and Word2Vec approaches. We experimented with
various deep learning models, including CNNs, LSTMs, Bi-directional LSTMs (with
and without attention). We were able to outperform all state-of-the-art
performances with our deep learning models, with attention based Bi-directional
LSTMs giving the best performance exhibiting an accuracy of 78.49%.
Related papers
- Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems.
We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary.
Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - How Effective is Incongruity? Implications for Code-mix Sarcasm
Detection [0.0]
sarcasm poses several challenges for downstream NLP tasks.
We propose the idea of capturing incongruity through sub-word level embeddings learned via fastText.
Our proposed model achieves F1-score on code-mix Hinglish dataset comparable to pretrained multilingual models.
arXiv Detail & Related papers (2022-02-06T04:05:09Z) - Multimodal Learning using Optimal Transport for Sarcasm and Humor
Detection [76.62550719834722]
We deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs.
We propose a novel multimodal learning system, MuLOT, which utilizes self-attention to exploit intra-modal correspondence.
We test our approach for multimodal sarcasm and humor detection on three benchmark datasets.
arXiv Detail & Related papers (2021-10-21T07:51:56Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - Sarcasm Detection in Twitter -- Performance Impact when using Data
Augmentation: Word Embeddings [0.0]
Sarcasm is the use of words usually used to either mock or annoy someone, or for humorous purposes.
We propose a contextual model for sarcasm identification in twitter using RoBERTa and augmenting the dataset.
We achieve performance gain by 3.2% in the iSarcasm dataset when using data augmentation to increase 20% of data labeled as sarcastic.
arXiv Detail & Related papers (2021-08-23T04:24:12Z) - Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture Text
and English Humor Literature [0.76146285961466]
We manually extract the sarcastic word distribution features of a benchmark pop culture sarcasm corpus.
We generate input sequences formed of the weighted vectors from such words.
Our proposed model for detecting sarcasm peaks a training accuracy of 98.95% when trained with the discussed dataset.
arXiv Detail & Related papers (2021-06-10T14:01:07Z) - Towards Emotion Recognition in Hindi-English Code-Mixed Data: A
Transformer Based Approach [0.0]
We present a Hinglish dataset labelled for emotion detection.
We highlight a deep learning based approach for detecting emotions in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2021-02-19T14:07:20Z) - Interpretable Multi-Head Self-Attention model for Sarcasm Detection in
social media [0.0]
Inherent ambiguity in sarcastic expressions, make sarcasm detection very difficult.
We develop an interpretable deep learning model using multi-head self-attention and gated recurrent units.
We show the effectiveness of our approach by achieving state-of-the-art results on multiple datasets.
arXiv Detail & Related papers (2021-01-14T21:39:35Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.