Related papers: Detection of Emotions in Hindi-English Code Mixed Text Data

Related papers

EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian [60.61343989805093]
EmoBench-UA is the first annotated dataset for emotion detection in Ukrainian texts.<n>Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian.
arXiv Detail & Related papers (2025-05-29T09:49:57Z)
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting [48.56693150755667]
EmoVoice is a novel emotion-controllable TTS model that exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control. EmoVoice-DB is a high-quality 40-hour English emotion dataset featuring expressive speech and fine-grained emotion labels with natural language descriptions.
arXiv Detail & Related papers (2025-04-17T11:50:04Z)
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection [76.18321723846616]
Task covers more than 30 languages from seven distinct language families. Data instances are multi-labeled with six emotional classes, with additional datasets in 11 languages annotated for emotion intensity. Participants were asked to predict labels in three tracks: (a) multilabel emotion detection, (b) emotion intensity score detection, and (c) cross-lingual emotion detection.
arXiv Detail & Related papers (2025-03-10T12:49:31Z)
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER -- a collection of multi-labeled datasets in 28 different languages. We describe the data collection and annotation processes and the challenges of building these datasets. We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z)
On Importance of Code-Mixed Embeddings for Hate Speech Identification [0.4194295877935868]
We analyze the significance of code-mixed embeddings and evaluate the performance of BERT and HingBERT models in hate speech detection. Our study demonstrates that HingBERT models, benefiting from training on the extensive Hindi-English dataset L3-HingCorpus, outperform BERT models when tested on hate speech text datasets.
arXiv Detail & Related papers (2024-11-27T18:23:57Z)
Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild [45.29814349246784]
multimodal large language models (LLMs) rely on explicit non-verbal cues that may be translated from different non-textual modalities into text. This paper compares the potential of text- and feature-based approaches for compound multimodal ER in videos.
arXiv Detail & Related papers (2024-07-17T18:01:25Z)
Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet. On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z)
Sociolinguistically Informed Interpretability: A Case Study on Hinglish Emotion Classification [8.010713141364752]
We study the effect of language on emotion prediction across 3 PLMs on a Hinglish emotion classification dataset. We find that models do learn these associations between language choice and emotional expression. Having code-mixed data present in the pre-training can augment that learning when task-specific data is scarce.
arXiv Detail & Related papers (2024-02-05T16:05:32Z)
Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones? [57.16050211534735]
We investigate the types and frequency of free-text human feedback in commonly used dialog datasets. Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.
arXiv Detail & Related papers (2023-10-24T12:01:11Z)
Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages [47.78634360870564]
We explore prompting multilingual models to generate code-mixed data for seven languages in South East Asia (SEA) We find that publicly available multilingual instruction-tuned models such as BLOOMZ are incapable of producing texts with phrases or clauses from different languages. ChatGPT exhibits inconsistent capabilities in generating code-mixed texts, wherein its performance varies depending on the prompt template and language pairing.
arXiv Detail & Related papers (2023-03-23T18:16:30Z)
ReDDIT: Regret Detection and Domain Identification from Text [62.997667081978825]
We present a novel dataset of Reddit texts that have been classified into three classes: Regret by Action, Regret by Inaction, and No Regret. Our findings show that Reddit users are most likely to express regret for past actions, particularly in the domain of relationships.
arXiv Detail & Related papers (2022-12-14T23:41:57Z)
EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues [44.79509115642278]
We create a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations. We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims.
arXiv Detail & Related papers (2022-05-27T11:23:50Z)
Language Identification of Hindi-English tweets using code-mixed BERT [0.0]
The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification. The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.
arXiv Detail & Related papers (2021-07-02T17:51:36Z)
Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world. Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages. We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z)
Towards Emotion Recognition in Hindi-English Code-Mixed Data: A Transformer Based Approach [0.0]
We present a Hinglish dataset labelled for emotion detection. We highlight a deep learning based approach for detecting emotions in Hindi-English code mixed tweets.
arXiv Detail & Related papers (2021-02-19T14:07:20Z)
NUIG-Shubhanker@Dravidian-CodeMix-FIRE2020: Sentiment Analysis of Code-Mixed Dravidian text using XLNet [0.0]
Social media has penetrated into multilingual societies, however most of them use English to be a preferred language for communication. It looks natural for them to mix their cultural language with English during conversations resulting in abundance of multilingual data, call this code-mixed data, available in todays' world. Downstream NLP tasks using such data is challenging due to the semantic nature of it being spread across multiple languages. This paper uses an auto-regressive XLNet model to perform sentiment analysis on code-mixed Tamil-English and Malayalam-English datasets.
arXiv Detail & Related papers (2020-10-15T14:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.