Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning
- URL: http://arxiv.org/abs/2206.07229v1
- Date: Wed, 15 Jun 2022 01:25:32 GMT
- Title: Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning
- Authors: Rui Liu, Berrak Sisman, Bj\"orn Schuller, Guanglai Gao and Haizhou Li
- Abstract summary: We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
- Score: 70.30713251031052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion classification of speech and assessment of the emotion strength are
required in applications such as emotional text-to-speech and voice conversion.
The emotion attribute ranking function based on Support Vector Machine (SVM)
was proposed to predict emotion strength for emotional speech corpus. However,
the trained ranking function doesn't generalize to new domains, which limits
the scope of applications, especially for out-of-domain or unseen speech. In
this paper, we propose a data-driven deep learning model, i.e. StrengthNet, to
improve the generalization of emotion strength assessment for seen and unseen
speech. This is achieved by the fusion of emotional data from various domains.
We follow a multi-task learning network architecture that includes an acoustic
encoder, a strength predictor, and an auxiliary emotion predictor. Experiments
show that the predicted emotion strength of the proposed StrengthNet is highly
correlated with ground truth scores for both seen and unseen speech. We release
the source codes at: https://github.com/ttslr/StrengthNet.
Related papers
- Speech Emotion Recognition Using CNN and Its Use Case in Digital Healthcare [0.0]
The process of identifying human emotion and affective states from speech is known as speech emotion recognition (SER)
My research seeks to use the Convolutional Neural Network (CNN) to distinguish emotions from audio recordings and label them in accordance with the range of different emotions.
I have developed a machine learning model to identify emotions from supplied audio files with the aid of machine learning methods.
arXiv Detail & Related papers (2024-06-15T21:33:03Z) - Attention-based Interactive Disentangling Network for Instance-level
Emotional Voice Conversion [81.1492897350032]
Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components.
We propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion.
arXiv Detail & Related papers (2023-12-29T08:06:45Z) - AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect
Transfer for Speech Synthesis [13.918119853846838]
Affect is an emotional characteristic encompassing valence, arousal, and intensity, and is a crucial attribute for enabling authentic conversations.
We propose AffectEcho, an emotion translation model, that uses a Vector Quantized codebook to model emotions within a quantized space.
We demonstrate the effectiveness of our approach in controlling the emotions of generated speech while preserving identity, style, and emotional cadence unique to each speaker.
arXiv Detail & Related papers (2023-08-16T06:28:29Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - StrengthNet: Deep Learning-based Emotion Strength Assessment for
Emotional Speech Synthesis [82.39099867188547]
We propose a deep learning based emotion strength assessment network for strength prediction that is referred to as StrengthNet.
Our model conforms to a multi-task learning framework with a structure that includes an acoustic encoder, a strength predictor and an auxiliary emotion predictor.
Experiments show that the predicted emotion strength of the proposed StrengthNet are highly correlated with ground truth scores for seen and unseen speech.
arXiv Detail & Related papers (2021-10-07T03:16:15Z) - FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition [0.015863809575305417]
We introduce FSER, a speech emotion recognition model trained on four valid speech databases.
On each benchmark dataset, FSER outperforms the best models introduced so far, achieving a state-of-the-art performance.
FSER could potentially be used to improve mental and emotional health care.
arXiv Detail & Related papers (2021-09-15T05:03:24Z) - Seen and Unseen emotional style transfer for voice conversion with a new
emotional speech dataset [84.53659233967225]
Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.
We propose a novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN)
We show that the proposed framework achieves remarkable performance by consistently outperforming the baseline framework.
arXiv Detail & Related papers (2020-10-28T07:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.