Hybrid Handcrafted and Learnable Audio Representation for Analysis of
Speech Under Cognitive and Physical Load
- URL: http://arxiv.org/abs/2203.16637v1
- Date: Wed, 30 Mar 2022 19:43:21 GMT
- Title: Hybrid Handcrafted and Learnable Audio Representation for Analysis of
Speech Under Cognitive and Physical Load
- Authors: Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic,
Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak
- Abstract summary: We introduce a set of five datasets for task load detection in speech.
The voice recordings were collected as either cognitive or physical stress was induced in the cohort of volunteers.
We used the datasets to design and evaluate a novel self-supervised audio representation.
- Score: 17.394964035035866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a neurophysiological response to threat or adverse conditions, stress can
affect cognition, emotion and behaviour with potentially detrimental effects on
health in the case of sustained exposure. Since the affective content of speech
is inherently modulated by an individual's physical and mental state, a
substantial body of research has been devoted to the study of paralinguistic
correlates of stress-inducing task load. Historically, voice stress analysis
(VSA) has been conducted using conventional digital signal processing (DSP)
techniques. Despite the development of modern methods based on deep neural
networks (DNNs), accurately detecting stress in speech remains difficult due to
the wide variety of stressors and considerable variability in the individual
stress perception. To that end, we introduce a set of five datasets for task
load detection in speech. The voice recordings were collected as either
cognitive or physical stress was induced in the cohort of volunteers, with a
cumulative number of more than a hundred speakers. We used the datasets to
design and evaluate a novel self-supervised audio representation that leverages
the effectiveness of handcrafted features (DSP-based) and the complexity of
data-driven DNN representations. Notably, the proposed approach outperformed
both extensive handcrafted feature sets and novel DNN-based audio
representation learning approaches.
Related papers
- Predicting Heart Activity from Speech using Data-driven and Knowledge-based features [19.14666002797423]
We show that self-supervised speech models outperform acoustic features in predicting heart activity parameters.
These findings underscore the value of data-driven representations in such tasks.
arXiv Detail & Related papers (2024-06-10T15:01:46Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Personalization of Stress Mobile Sensing using Self-Supervised Learning [1.7598252755538808]
Stress is widely recognized as a major contributor to a variety of health issues.
Real-time stress prediction can enable digital interventions to immediately react at the onset of stress, helping to avoid many psychological and physiological symptoms such as heart rhythm irregularities.
However, major challenges with the prediction of stress using machine learning include the subjectivity and sparseness of the labels, a large feature space, relatively few labels, and a complex nonlinear and subjective relationship between the features and outcomes.
arXiv Detail & Related papers (2023-08-04T22:26:33Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Insights on Modelling Physiological, Appraisal, and Affective Indicators
of Stress using Audio Features [10.093374748790037]
Utilising speech samples collected while the subject is undergoing an induced stress episode has recently shown promising results for the automatic characterisation of individual stress responses.
We introduce new findings that shed light onto whether speech signals are suited to model physiological biomarkers.
arXiv Detail & Related papers (2022-05-09T14:32:38Z) - The world seems different in a social context: a neural network analysis
of human experimental data [57.729312306803955]
We show that it is possible to replicate human behavioral data in both individual and social task settings by modifying the precision of prior and sensory signals.
An analysis of the neural activation traces of the trained networks provides evidence that information is coded in fundamentally different ways in the network in the individual and in the social conditions.
arXiv Detail & Related papers (2022-03-03T17:19:12Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis
Tool for Singers [0.0]
Current computational-emotion research has focused on applying acoustic properties to analyze how emotions are perceived mathematically.
This paper seeks to reflect and expand upon the findings of related research and present a stepping-stone toward this end goal.
arXiv Detail & Related papers (2021-05-01T05:47:15Z) - Deep Recurrent Encoder: A scalable end-to-end network to model brain
signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once.
We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.