Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis
Tool for Singers
- URL: http://arxiv.org/abs/2105.00173v1
- Date: Sat, 1 May 2021 05:47:15 GMT
- Title: Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis
Tool for Singers
- Authors: Daniel Szelogowski
- Abstract summary: Current computational-emotion research has focused on applying acoustic properties to analyze how emotions are perceived mathematically.
This paper seeks to reflect and expand upon the findings of related research and present a stepping-stone toward this end goal.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current computational-emotion research has focused on applying acoustic
properties to analyze how emotions are perceived mathematically or used in
natural language processing machine learning models. With most recent interest
being in analyzing emotions from the spoken voice, little experimentation has
been performed to discover how emotions are recognized in the singing voice --
both in noiseless and noisy data (i.e., data that is either inaccurate,
difficult to interpret, has corrupted/distorted/nonsense information like
actual noise sounds in this case, or has a low ratio of usable/unusable
information). Not only does this ignore the challenges of training machine
learning models on more subjective data and testing them with much noisier
data, but there is also a clear disconnect in progress between advancing the
development of convolutional neural networks and the goal of emotionally
cognizant artificial intelligence. By training a new model to include this type
of information with a rich comprehension of psycho-acoustic properties, not
only can models be trained to recognize information within extremely noisy
data, but advancement can be made toward more complex biofeedback applications
-- including creating a model which could recognize emotions given any human
information (language, breath, voice, body, posture) and be used in any
performance medium (music, speech, acting) or psychological assistance for
patients with disorders such as BPD, alexithymia, autism, among others. This
paper seeks to reflect and expand upon the findings of related research and
present a stepping-stone toward this end goal.
Related papers
- Speech Emotion Recognition Using CNN and Its Use Case in Digital Healthcare [0.0]
The process of identifying human emotion and affective states from speech is known as speech emotion recognition (SER)
My research seeks to use the Convolutional Neural Network (CNN) to distinguish emotions from audio recordings and label them in accordance with the range of different emotions.
I have developed a machine learning model to identify emotions from supplied audio files with the aid of machine learning methods.
arXiv Detail & Related papers (2024-06-15T21:33:03Z) - Modeling User Preferences via Brain-Computer Interfacing [54.3727087164445]
We use Brain-Computer Interfacing technology to infer users' preferences, their attentional correlates towards visual content, and their associations with affective experience.
We link these to relevant applications, such as information retrieval, personalized steering of generative models, and crowdsourcing population estimates of affective experiences.
arXiv Detail & Related papers (2024-05-15T20:41:46Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Describing emotions with acoustic property prompts for speech emotion
recognition [30.990720176317463]
We devise a method to automatically create a description for a given audio by computing acoustic properties, such as pitch, loudness, speech rate, and articulation rate.
We train a neural network model using these audio-text pairs and evaluate the model using one more dataset.
We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.
arXiv Detail & Related papers (2022-11-14T20:29:37Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - Hybrid Handcrafted and Learnable Audio Representation for Analysis of
Speech Under Cognitive and Physical Load [17.394964035035866]
We introduce a set of five datasets for task load detection in speech.
The voice recordings were collected as either cognitive or physical stress was induced in the cohort of volunteers.
We used the datasets to design and evaluate a novel self-supervised audio representation.
arXiv Detail & Related papers (2022-03-30T19:43:21Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Predicting Emotions Perceived from Sounds [2.9398911304923447]
Sonification is the science of communication of data and events to users through sounds.
This paper conducts an experiment through which several mainstream and conventional machine learning algorithms are developed.
It is possible to predict perceived emotions with high accuracy.
arXiv Detail & Related papers (2020-12-04T15:01:59Z) - Speech-Based Emotion Recognition using Neural Networks and Information
Visualization [1.52292571922932]
We propose a tool which enables users to take speech samples and identify a range of emotions from audio elements.
The dashboard is designed based on local therapists' needs for intuitive representations of speech data.
arXiv Detail & Related papers (2020-10-28T20:57:32Z) - A Developmental Neuro-Robotics Approach for Boosting the Recognition of
Handwritten Digits [91.3755431537592]
Recent evidence shows that a simulation of the children's embodied strategies can improve the machine intelligence too.
This article explores the application of embodied strategies to convolutional neural network models in the context of developmental neuro-robotics.
arXiv Detail & Related papers (2020-03-23T14:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.