EmoHopeSpeech: An Annotated Dataset of Emotions and Hope Speech in English and Arabic
- URL: http://arxiv.org/abs/2505.11959v2
- Date: Wed, 21 May 2025 06:06:55 GMT
- Title: EmoHopeSpeech: An Annotated Dataset of Emotions and Hope Speech in English and Arabic
- Authors: Wajdi Zaghouani, Md. Rafiul Biswas,
- Abstract summary: This research introduces a bilingual dataset comprising 23,456 entries for Arabic and 10,036 entries for English, annotated for emotions and hope speech.<n>The dataset provides comprehensive annotations capturing emotion intensity, complexity, and causes, alongside detailed classifications and subcategories for hope speech.
- Score: 0.021665899581403608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research introduces a bilingual dataset comprising 23,456 entries for Arabic and 10,036 entries for English, annotated for emotions and hope speech, addressing the scarcity of multi-emotion (Emotion and hope) datasets. The dataset provides comprehensive annotations capturing emotion intensity, complexity, and causes, alongside detailed classifications and subcategories for hope speech. To ensure annotation reliability, Fleiss' Kappa was employed, revealing 0.75-0.85 agreement among annotators both for Arabic and English language. The evaluation metrics (micro-F1-Score=0.67) obtained from the baseline model (i.e., using a machine learning model) validate that the data annotations are worthy. This dataset offers a valuable resource for advancing natural language processing in underrepresented languages, fostering better cross-linguistic analysis of emotions and hope speech.
Related papers
- DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech [11.79037119988533]
We present DEBATE, a unique public Chinese speech-text dataset.<n>It contains 1,001 carefully selected ambiguous utterances recorded by 10 native speakers.<n>We benchmark three state-of-the-art large speech and audio-language models, illustrating clear and huge performance gaps between machine and human understanding of spoken intent.
arXiv Detail & Related papers (2025-06-09T07:27:22Z) - EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian [60.61343989805093]
EmoBench-UA is the first annotated dataset for emotion detection in Ukrainian texts.<n>Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian.
arXiv Detail & Related papers (2025-05-29T09:49:57Z) - MELD-ST: An Emotion-aware Speech Translation Dataset [29.650945917540316]
We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs.
Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset.
Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings.
arXiv Detail & Related papers (2024-05-21T22:40:38Z) - English Prompts are Better for NLI-based Zero-Shot Emotion
Classification than Target-Language Prompts [17.099269597133265]
We show that it is consistently better to use English prompts even if the data is in a different language.
Our experiments with natural language inference-based language models show that it is consistently better to use English prompts even if the data is in a different language.
arXiv Detail & Related papers (2024-02-05T17:36:19Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - CLARA: Multilingual Contrastive Learning for Audio Representation
Acquisition [5.520654376217889]
CLARA minimizes reliance on labelled data, enhancing generalization across languages.
Our approach adeptly captures emotional nuances in speech, overcoming subjective assessment issues.
It adapts to low-resource languages, marking progress in multilingual speech representation learning.
arXiv Detail & Related papers (2023-10-18T09:31:56Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - Taking an Emotional Look at Video Paragraph Captioning [38.98567869515748]
This work is conducted on video paragraph captioning, with the goal to generate paragraph-level descriptions for a given video.
To solve this problem, we propose to construct a large-scale emotion and logic driven multilingual dataset for this task.
This dataset is named EMVPC and contains 53 widely-used emotions in daily life, 376 common scenes corresponding to these emotions, 10,291 high-quality videos and 20,582 elaborated paragraph captions with English and Chinese versions.
arXiv Detail & Related papers (2022-03-12T06:19:48Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.