BANSpEmo: A Bangla Emotional Speech Recognition Dataset
- URL: http://arxiv.org/abs/2312.14020v1
- Date: Thu, 21 Dec 2023 16:52:41 GMT
- Title: BANSpEmo: A Bangla Emotional Speech Recognition Dataset
- Authors: Md Gulzar Hussain, Mahmuda Rahman, Babe Sultana, Ye Shiren
- Abstract summary: This corpus contains 792 audio recordings over a duration of more than 1 hour and 23 minutes.
The data set consists of 12 Bangla sentences which are uttered in 6 emotions as Disgust, Happy, Sad, Surprised, Anger, and Fear.
BanSpEmo can be considered as a useful resource to promote emotion and speech recognition research and related applications in the Bangla language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of audio and speech analysis, the ability to identify emotions
from acoustic signals is essential. Human-computer interaction (HCI) and
behavioural analysis are only a few of the many areas where the capacity to
distinguish emotions from speech signals has an extensive range of
applications. Here, we are introducing BanSpEmo, a corpus of emotional speech
that only consists of audio recordings and has been created specifically for
the Bangla language. This corpus contains 792 audio recordings over a duration
of more than 1 hour and 23 minutes. 22 native speakers took part in the
recording of two sets of sentences that represent the six desired emotions. The
data set consists of 12 Bangla sentences which are uttered in 6 emotions as
Disgust, Happy, Sad, Surprised, Anger, and Fear. This corpus is not also gender
balanced. Ten individuals who either have experience in related field or have
acting experience took part in the assessment of this corpus. It has a balanced
number of audio recordings in each emotion class. BanSpEmo can be considered as
a useful resource to promote emotion and speech recognition research and
related applications in the Bangla language. The dataset can be found here:
https://data.mendeley.com/datasets/rdwn4bs5ky and might be employed for
academic research.
Related papers
- nEMO: Dataset of Emotional Speech in Polish [0.0]
nEMO is a novel corpus of emotional speech in Polish.
The dataset comprises over 3 hours of samples recorded with the participation of nine actors portraying six emotional states.
The text material used was carefully selected to represent the phonetics of the Polish language adequately.
arXiv Detail & Related papers (2024-04-09T13:18:52Z) - Speech and Text-Based Emotion Recognizer [0.9168634432094885]
We build a balanced corpus from publicly available datasets for speech emotion recognition.
Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66.
arXiv Detail & Related papers (2023-12-10T05:17:39Z) - EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative
storytelling in games, television and graphic novels [6.2375553155844266]
The Emotive Narrative Storytelling (EMNS) corpus is a unique speech dataset created to enhance conversations' emotive quality.
It consists of a 2.3-hour recording featuring a female speaker delivering labelled utterances.
It encompasses eight acted emotional states, evenly distributed with a variance of 0.68%, along with expressiveness levels and natural language descriptions with word emphasis labels.
arXiv Detail & Related papers (2023-05-22T15:32:32Z) - Describing emotions with acoustic property prompts for speech emotion
recognition [30.990720176317463]
We devise a method to automatically create a description for a given audio by computing acoustic properties, such as pitch, loudness, speech rate, and articulation rate.
We train a neural network model using these audio-text pairs and evaluate the model using one more dataset.
We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.
arXiv Detail & Related papers (2022-11-14T20:29:37Z) - Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions.
We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.
At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Is Speech Emotion Recognition Language-Independent? Analysis of English
and Bangla Languages using Language-Independent Vocal Features [4.446085353384894]
We used Bangla and English languages to assess whether distinguishing emotions from speech is independent of language.
The following emotions were categorized for this study: happiness, anger, neutral, sadness, disgust, and fear.
Although this study reveals that Speech Emotion Recognition (SER) is mostly language-independent, there is some disparity while recognizing emotional states like disgust and fear in these two languages.
arXiv Detail & Related papers (2021-11-21T09:28:49Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Limited Data Emotional Voice Conversion Leveraging Text-to-Speech:
Two-stage Sequence-to-Sequence Training [91.95855310211176]
Emotional voice conversion aims to change the emotional state of an utterance while preserving the linguistic content and speaker identity.
We propose a novel 2-stage training strategy for sequence-to-sequence emotional voice conversion with a limited amount of emotional speech data.
The proposed framework can perform both spectrum and prosody conversion and achieves significant improvement over the state-of-the-art baselines in both objective and subjective evaluation.
arXiv Detail & Related papers (2021-03-31T04:56:14Z) - Annotation of Emotion Carriers in Personal Narratives [69.07034604580214]
We are interested in the problem of understanding personal narratives (PN) - spoken or written - recollections of facts, events, and thoughts.
In PN, emotion carriers are the speech or text segments that best explain the emotional state of the user.
This work proposes and evaluates an annotation model for identifying emotion carriers in spoken personal narratives.
arXiv Detail & Related papers (2020-02-27T15:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.