Predicting Heart Activity from Speech using Data-driven and Knowledge-based features
- URL: http://arxiv.org/abs/2406.06341v1
- Date: Mon, 10 Jun 2024 15:01:46 GMT
- Title: Predicting Heart Activity from Speech using Data-driven and Knowledge-based features
- Authors: Gasser Elbanna, Zohreh Mostaani, Mathew Magimai. -Doss,
- Abstract summary: We show that self-supervised speech models outperform acoustic features in predicting heart activity parameters.
These findings underscore the value of data-driven representations in such tasks.
- Score: 19.14666002797423
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Accurately predicting heart activity and other biological signals is crucial for diagnosis and monitoring. Given that speech is an outcome of multiple physiological systems, a significant body of work studied the acoustic correlates of heart activity. Recently, self-supervised models have excelled in speech-related tasks compared to traditional acoustic methods. However, the robustness of data-driven representations in predicting heart activity remained unexplored. In this study, we demonstrate that self-supervised speech models outperform acoustic features in predicting heart activity parameters. We also emphasize the impact of individual variability on model generalizability. These findings underscore the value of data-driven representations in such tasks and the need for more speech-based physiological data to mitigate speaker-related challenges.
Related papers
- Exploring Differences between Human Perception and Model Inference in Audio Event Recognition [26.60579496336448]
This paper introduces the concept of semantic importance in Audio Event Recognition (AER)
It focuses on exploring the differences between human perception and model inference.
By comparing human annotations with the predictions of ensemble pre-trained models, this paper uncovers a significant gap between human perception and model inference.
arXiv Detail & Related papers (2024-09-10T15:19:50Z) - Deep State-Space Generative Model For Correlated Time-to-Event Predictions [54.3637600983898]
We propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events.
Our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.
arXiv Detail & Related papers (2024-07-28T02:42:36Z) - Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders [0.8796261172196743]
We train and compare two configurations of Audio Spectrogram Transformer in the context of Voice Disorder Detection.
We apply the attention rollout method to produce model relevance maps, the computed relevance of the spectrogram regions when the model makes predictions.
We use these maps to analyse how models make predictions in different conditions and to show that the spread of attention is reduced as a model is finetuned.
arXiv Detail & Related papers (2024-06-29T21:14:48Z) - Evaluating Speaker Identity Coding in Self-supervised Models and Humans [0.42303492200814446]
Speaker identity plays a significant role in human communication and is being increasingly used in societal applications.
We show that self-supervised representations from different families are significantly better for speaker identification over acoustic representations.
We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks.
arXiv Detail & Related papers (2024-06-14T20:07:21Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Dataset Bias in Human Activity Recognition [57.91018542715725]
This contribution statistically curates the training data to assess to what degree the physical characteristics of humans influence HAR performance.
We evaluate the performance of a state-of-the-art convolutional neural network on two HAR datasets that vary in the sensors, activities, and recording for time-series HAR.
arXiv Detail & Related papers (2023-01-19T12:33:50Z) - Insights on Modelling Physiological, Appraisal, and Affective Indicators
of Stress using Audio Features [10.093374748790037]
Utilising speech samples collected while the subject is undergoing an induced stress episode has recently shown promising results for the automatic characterisation of individual stress responses.
We introduce new findings that shed light onto whether speech signals are suited to model physiological biomarkers.
arXiv Detail & Related papers (2022-05-09T14:32:38Z) - Hybrid Handcrafted and Learnable Audio Representation for Analysis of
Speech Under Cognitive and Physical Load [17.394964035035866]
We introduce a set of five datasets for task load detection in speech.
The voice recordings were collected as either cognitive or physical stress was induced in the cohort of volunteers.
We used the datasets to design and evaluate a novel self-supervised audio representation.
arXiv Detail & Related papers (2022-03-30T19:43:21Z) - Measuring the Impact of Individual Domain Factors in Self-Supervised
Pre-Training [60.825471653739555]
We show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important.
This is the first study to better understand the domain characteristics of pre-trained sets in self-supervised pre-training for speech.
arXiv Detail & Related papers (2022-03-01T17:40:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.