Detection of Emotions in Hindi-English Code Mixed Text Data
- URL: http://arxiv.org/abs/2105.09226v2
- Date: Fri, 21 May 2021 08:17:59 GMT
- Title: Detection of Emotions in Hindi-English Code Mixed Text Data
- Authors: Divyansh Singh
- Abstract summary: In recent times, we have seen an increased use of text chat for communication on social networks and smartphones.
This particularly involves the use of Hindi-English code-mixed text which contains words which are not recognized in English vocabulary.
We have worked on detecting emotions in these mixed data and classify the sentences in human emotions which are angry, fear, happy or sad.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent times, we have seen an increased use of text chat for communication
on social networks and smartphones. This particularly involves the use of
Hindi-English code-mixed text which contains words which are not recognized in
English vocabulary. We have worked on detecting emotions in these mixed data
and classify the sentences in human emotions which are angry, fear, happy or
sad. We have used state of the art natural language processing models and
compared their performance on the dataset comprising sentences in this mixed
data. The dataset was collected and annotated from sources and then used to
train the models.
Related papers
- BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER, a collection of emotion-annotated datasets in 28 different languages.
We describe the data collection and annotation processes and the challenges of building these datasets.
We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z) - On Importance of Code-Mixed Embeddings for Hate Speech Identification [0.4194295877935868]
We analyze the significance of code-mixed embeddings and evaluate the performance of BERT and HingBERT models in hate speech detection.
Our study demonstrates that HingBERT models, benefiting from training on the extensive Hindi-English dataset L3-HingCorpus, outperform BERT models when tested on hate speech text datasets.
arXiv Detail & Related papers (2024-11-27T18:23:57Z) - Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - Sociolinguistically Informed Interpretability: A Case Study on Hinglish
Emotion Classification [8.010713141364752]
We study the effect of language on emotion prediction across 3 PLMs on a Hinglish emotion classification dataset.
We find that models do learn these associations between language choice and emotional expression.
Having code-mixed data present in the pre-training can augment that learning when task-specific data is scarce.
arXiv Detail & Related papers (2024-02-05T16:05:32Z) - Prompting Multilingual Large Language Models to Generate Code-Mixed
Texts: The Case of South East Asian Languages [47.78634360870564]
We explore prompting multilingual models to generate code-mixed data for seven languages in South East Asia (SEA)
We find that publicly available multilingual instruction-tuned models such as BLOOMZ are incapable of producing texts with phrases or clauses from different languages.
ChatGPT exhibits inconsistent capabilities in generating code-mixed texts, wherein its performance varies depending on the prompt template and language pairing.
arXiv Detail & Related papers (2023-03-23T18:16:30Z) - ReDDIT: Regret Detection and Domain Identification from Text [62.997667081978825]
We present a novel dataset of Reddit texts that have been classified into three classes: Regret by Action, Regret by Inaction, and No Regret.
Our findings show that Reddit users are most likely to express regret for past actions, particularly in the domain of relationships.
arXiv Detail & Related papers (2022-12-14T23:41:57Z) - EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in
Hindi for Emotion Recognition in Dialogues [44.79509115642278]
We create a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations.
We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims.
arXiv Detail & Related papers (2022-05-27T11:23:50Z) - Language Identification of Hindi-English tweets using code-mixed BERT [0.0]
The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification.
The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.
arXiv Detail & Related papers (2021-07-02T17:51:36Z) - Role of Artificial Intelligence in Detection of Hateful Speech for
Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world.
Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages.
We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z) - BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language
Generation [42.34923623457615]
Bias in Open-Ended Language Generation dataset consists of 23,679 English text generation prompts.
An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text.
arXiv Detail & Related papers (2021-01-27T22:07:03Z) - NUIG-Shubhanker@Dravidian-CodeMix-FIRE2020: Sentiment Analysis of
Code-Mixed Dravidian text using XLNet [0.0]
Social media has penetrated into multilingual societies, however most of them use English to be a preferred language for communication.
It looks natural for them to mix their cultural language with English during conversations resulting in abundance of multilingual data, call this code-mixed data, available in todays' world.
Downstream NLP tasks using such data is challenging due to the semantic nature of it being spread across multiple languages.
This paper uses an auto-regressive XLNet model to perform sentiment analysis on code-mixed Tamil-English and Malayalam-English datasets.
arXiv Detail & Related papers (2020-10-15T14:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.