Predicting Emotions Perceived from Sounds
- URL: http://arxiv.org/abs/2012.02643v1
- Date: Fri, 4 Dec 2020 15:01:59 GMT
- Title: Predicting Emotions Perceived from Sounds
- Authors: Faranak Abri, Luis Felipe Guti\'errez, Akbar Siami Namin, David R. W.
Sears, Keith S. Jones
- Abstract summary: Sonification is the science of communication of data and events to users through sounds.
This paper conducts an experiment through which several mainstream and conventional machine learning algorithms are developed.
It is possible to predict perceived emotions with high accuracy.
- Score: 2.9398911304923447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sonification is the science of communication of data and events to users
through sounds. Auditory icons, earcons, and speech are the common auditory
display schemes utilized in sonification, or more specifically in the use of
audio to convey information. Once the captured data are perceived, their
meanings, and more importantly, intentions can be interpreted more easily and
thus can be employed as a complement to visualization techniques. Through
auditory perception it is possible to convey information related to temporal,
spatial, or some other context-oriented information. An important research
question is whether the emotions perceived from these auditory icons or earcons
are predictable in order to build an automated sonification platform. This
paper conducts an experiment through which several mainstream and conventional
machine learning algorithms are developed to study the prediction of emotions
perceived from sounds. To do so, the key features of sounds are captured and
then are modeled using machine learning algorithms using feature reduction
techniques. We observe that it is possible to predict perceived emotions with
high accuracy. In particular, the regression based on Random Forest
demonstrated its superiority compared to other machine learning algorithms.
Related papers
- Speech Emotion Recognition Using CNN and Its Use Case in Digital Healthcare [0.0]
The process of identifying human emotion and affective states from speech is known as speech emotion recognition (SER)
My research seeks to use the Convolutional Neural Network (CNN) to distinguish emotions from audio recordings and label them in accordance with the range of different emotions.
I have developed a machine learning model to identify emotions from supplied audio files with the aid of machine learning methods.
arXiv Detail & Related papers (2024-06-15T21:33:03Z) - Speech and Text-Based Emotion Recognizer [0.9168634432094885]
We build a balanced corpus from publicly available datasets for speech emotion recognition.
Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66.
arXiv Detail & Related papers (2023-12-10T05:17:39Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
by Re-Synthesis [67.73554826428762]
We propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR.
Our approach leverages audio-visual speech cues to generate the codes of a neural speech, enabling efficient synthesis of clean, realistic speech from noisy signals.
arXiv Detail & Related papers (2022-03-31T17:57:10Z) - Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural
Sounds [118.54908665440826]
Humans can robustly recognize and localize objects by using visual and/or auditory cues.
This work develops an approach for scene understanding purely based on sounds.
The co-existence of visual and audio cues is leveraged for supervision transfer.
arXiv Detail & Related papers (2021-09-06T22:24:00Z) - Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition.
Our idea is to learn to dereverberate speech from audio-visual observations.
We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z) - Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis
Tool for Singers [0.0]
Current computational-emotion research has focused on applying acoustic properties to analyze how emotions are perceived mathematically.
This paper seeks to reflect and expand upon the findings of related research and present a stepping-stone toward this end goal.
arXiv Detail & Related papers (2021-05-01T05:47:15Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Semantic Object Prediction and Spatial Sound Super-Resolution with
Binaural Sounds [106.87299276189458]
Humans can robustly recognize and localize objects by integrating visual and auditory cues.
This work develops an approach for dense semantic labelling of sound-making objects, purely based on sounds.
arXiv Detail & Related papers (2020-03-09T15:49:01Z) - Emotion Recognition System from Speech and Visual Information based on
Convolutional Neural Networks [6.676572642463495]
We propose a system that is able to recognize emotions with a high accuracy rate and in real time.
In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources.
arXiv Detail & Related papers (2020-02-29T22:09:46Z) - Unsupervised Learning of Audio Perception for Robotics Applications:
Learning to Project Data to T-SNE/UMAP space [2.8935588665357077]
This paper builds upon key ideas to build perception of touch sounds without access to any ground-truth data.
We show how we can leverage ideas from classical signal processing to get large amounts of data of any sound of interest with a high precision.
arXiv Detail & Related papers (2020-02-10T20:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.