Design, construction and evaluation of emotional multimodal pathological
speech database
- URL: http://arxiv.org/abs/2312.08998v1
- Date: Thu, 14 Dec 2023 14:43:31 GMT
- Title: Design, construction and evaluation of emotional multimodal pathological
speech database
- Authors: Ting Zhu, Shufei Duan, Huizhi Liang, Wei Zhang
- Abstract summary: The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed.
All emotional speech was labeled for intelligibility, types and discrete dimensional emotions by developed WeChat mini-program.
The automatic recognition tested on speech and glottal data, with average accuracy of 78% for controls and 60% for patients in audio, while 51% for controls and 38% for patients in glottal data, indicating an influence of the disease on emotional expression.
- Score: 8.774681418339155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lack of an available emotion pathology database is one of the key
obstacles in studying the emotion expression status of patients with
dysarthria. The first Chinese multimodal emotional pathological speech database
containing multi-perspective information is constructed in this paper. It
includes 29 controls and 39 patients with different degrees of motor
dysarthria, expressing happy, sad, angry and neutral emotions. All emotional
speech was labeled for intelligibility, types and discrete dimensional emotions
by developed WeChat mini-program. The subjective analysis justifies from
emotion discrimination accuracy, speech intelligibility, valence-arousal
spatial distribution, and correlation between SCL-90 and disease severity. The
automatic recognition tested on speech and glottal data, with average accuracy
of 78% for controls and 60% for patients in audio, while 51% for controls and
38% for patients in glottal data, indicating an influence of the disease on
emotional expression.
Related papers
- Empowering Dysarthric Speech: Leveraging Advanced LLMs for Accurate Speech Correction and Multimodal Emotion Analysis [0.0]
This paper introduces a novel approach to recognizing and translating dysarthric speech.
We leverage advanced large language models for accurate speech correction and multimodal emotion analysis.
Our framework identifies emotions such as happiness, sadness, neutrality, surprise, anger, and fear, while reconstructing intended sentences from distorted speech with high accuracy.
arXiv Detail & Related papers (2024-10-13T20:54:44Z) - MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [59.515827458631975]
Mental health disorders are one of the most serious diseases in the world.
Privacy concerns limit the accessibility of personalized treatment data.
MentalArena is a self-play framework to train language models.
arXiv Detail & Related papers (2024-10-09T13:06:40Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - Construction and Evaluation of Mandarin Multimodal Emotional Speech
Database [0.0]
The validity of dimension annotation is verified by statistical analysis of dimension annotation data.
The recognition rate of seven emotions is about 82% when using acoustic data alone.
The database is of high quality and can be used as an important source for speech analysis research.
arXiv Detail & Related papers (2024-01-14T17:56:36Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Language and Mental Health: Measures of Emotion Dynamics from Text as
Linguistic Biosocial Markers [30.656554495536618]
We study the relationship between tweet emotion dynamics and mental health disorders.
We find that each of the UED metrics studied varied by the user's self-disclosed diagnosis.
This work provides important early evidence for how linguistic cues pertaining to emotion dynamics can play a crucial role as biosocial markers for mental illnesses.
arXiv Detail & Related papers (2023-10-26T13:00:26Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Detecting Emotion Primitives from Speech and their use in discerning
Categorical Emotions [16.886826928295203]
Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity.
This work investigated how emotion primitives can be used to detect categorical emotions such as happiness, disgust, contempt, anger, and surprise from neutral speech.
Results indicated that arousal, followed by dominance was a better detector of such emotions.
arXiv Detail & Related papers (2020-01-31T03:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.