Detection of Emotions in Hindi-English Code Mixed Text Data
- URL: http://arxiv.org/abs/2105.09226v2
- Date: Fri, 21 May 2021 08:17:59 GMT
- Title: Detection of Emotions in Hindi-English Code Mixed Text Data
- Authors: Divyansh Singh
- Abstract summary: In recent times, we have seen an increased use of text chat for communication on social networks and smartphones.
This particularly involves the use of Hindi-English code-mixed text which contains words which are not recognized in English vocabulary.
We have worked on detecting emotions in these mixed data and classify the sentences in human emotions which are angry, fear, happy or sad.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent times, we have seen an increased use of text chat for communication
on social networks and smartphones. This particularly involves the use of
Hindi-English code-mixed text which contains words which are not recognized in
English vocabulary. We have worked on detecting emotions in these mixed data
and classify the sentences in human emotions which are angry, fear, happy or
sad. We have used state of the art natural language processing models and
compared their performance on the dataset comprising sentences in this mixed
data. The dataset was collected and annotated from sources and then used to
train the models.
Related papers
- Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild [45.29814349246784]
multimodal large language models (LLMs) rely on explicit non-verbal cues that may be translated from different non-textual modalities into text.
This paper compares the potential of text- and feature-based approaches for compound multimodal ER in videos.
arXiv Detail & Related papers (2024-07-17T18:01:25Z) - Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - Sociolinguistically Informed Interpretability: A Case Study on Hinglish
Emotion Classification [8.010713141364752]
We study the effect of language on emotion prediction across 3 PLMs on a Hinglish emotion classification dataset.
We find that models do learn these associations between language choice and emotional expression.
Having code-mixed data present in the pre-training can augment that learning when task-specific data is scarce.
arXiv Detail & Related papers (2024-02-05T16:05:32Z) - Learning from Emotions, Demographic Information and Implicit User
Feedback in Task-Oriented Document-Grounded Dialogues [59.516187851808375]
We introduce FEDI, the first English dialogue dataset for task-oriented document-grounded dialogues annotated with demographic information, user emotions and implicit feedback.
Our experiments with FLAN-T5, GPT-2 and LLaMA-2 show that these data have the potential to improve task completion and the factual consistency of the generated responses and user acceptance.
arXiv Detail & Related papers (2024-01-17T14:52:26Z) - Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend
Existing Ones? [57.16050211534735]
We investigate the types and frequency of free-text human feedback in commonly used dialog datasets.
Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.
arXiv Detail & Related papers (2023-10-24T12:01:11Z) - Prompting Multilingual Large Language Models to Generate Code-Mixed
Texts: The Case of South East Asian Languages [47.78634360870564]
We explore prompting multilingual models to generate code-mixed data for seven languages in South East Asia (SEA)
We find that publicly available multilingual instruction-tuned models such as BLOOMZ are incapable of producing texts with phrases or clauses from different languages.
ChatGPT exhibits inconsistent capabilities in generating code-mixed texts, wherein its performance varies depending on the prompt template and language pairing.
arXiv Detail & Related papers (2023-03-23T18:16:30Z) - ReDDIT: Regret Detection and Domain Identification from Text [62.997667081978825]
We present a novel dataset of Reddit texts that have been classified into three classes: Regret by Action, Regret by Inaction, and No Regret.
Our findings show that Reddit users are most likely to express regret for past actions, particularly in the domain of relationships.
arXiv Detail & Related papers (2022-12-14T23:41:57Z) - Language Identification of Hindi-English tweets using code-mixed BERT [0.0]
The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification.
The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.
arXiv Detail & Related papers (2021-07-02T17:51:36Z) - Role of Artificial Intelligence in Detection of Hateful Speech for
Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world.
Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages.
We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z) - Towards Emotion Recognition in Hindi-English Code-Mixed Data: A
Transformer Based Approach [0.0]
We present a Hinglish dataset labelled for emotion detection.
We highlight a deep learning based approach for detecting emotions in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2021-02-19T14:07:20Z) - NUIG-Shubhanker@Dravidian-CodeMix-FIRE2020: Sentiment Analysis of
Code-Mixed Dravidian text using XLNet [0.0]
Social media has penetrated into multilingual societies, however most of them use English to be a preferred language for communication.
It looks natural for them to mix their cultural language with English during conversations resulting in abundance of multilingual data, call this code-mixed data, available in todays' world.
Downstream NLP tasks using such data is challenging due to the semantic nature of it being spread across multiple languages.
This paper uses an auto-regressive XLNet model to perform sentiment analysis on code-mixed Tamil-English and Malayalam-English datasets.
arXiv Detail & Related papers (2020-10-15T14:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.