Respiratory Distress Detection from Telephone Speech using Acoustic and
Prosodic Features
- URL: http://arxiv.org/abs/2011.09270v1
- Date: Sun, 15 Nov 2020 13:32:45 GMT
- Title: Respiratory Distress Detection from Telephone Speech using Acoustic and
Prosodic Features
- Authors: Meemnur Rashid, Kaisar Ahmed Alman, Khaled Hasan, John H.L. Hansen and
Taufiq Hasan
- Abstract summary: This work summarizes our preliminary findings on automatic detection of respiratory distress using well-known acoustic and prosodic features.
Speech samples are collected from de-identified telemedicine phonecalls from a healthcare provider in Bangladesh.
We hypothesize that respiratory distress may alter speech features such as voice quality, speaking pattern, loudness, and speech-pause duration.
- Score: 27.77184655808592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the widespread use of telemedicine services, automatic assessment of
health conditions via telephone speech can significantly impact public health.
This work summarizes our preliminary findings on automatic detection of
respiratory distress using well-known acoustic and prosodic features. Speech
samples are collected from de-identified telemedicine phonecalls from a
healthcare provider in Bangladesh. The recordings include conversational speech
samples of patients talking to doctors showing mild or severe respiratory
distress or asthma symptoms. We hypothesize that respiratory distress may alter
speech features such as voice quality, speaking pattern, loudness, and
speech-pause duration. To capture these variations, we utilize a set of
well-known acoustic and prosodic features with a Support Vector Machine (SVM)
classifier for detecting the presence of respiratory distress. Experimental
evaluations are performed using a 3-fold cross-validation scheme, ensuring
patient-independent data splits. We obtained an overall accuracy of 86.4\% in
detecting respiratory distress from the speech recordings using the acoustic
feature set. Correlation analysis reveals that the top-performing features
include loudness, voice rate, voice duration, and pause duration.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech [2.935056044470713]
Respiratory rate (RR) is a vital metric that is used to assess the overall health, fitness, and general well-being of an individual.
Existing approaches to measure RR are performed using specialized equipment or training.
Studies have demonstrated that machine learning algorithms can be used to estimate RR using bio-sensor signals as input.
arXiv Detail & Related papers (2024-07-17T21:57:18Z) - Sustained Vowels for Pre- vs Post-Treatment COPD Classification [11.153412281447029]
Chronic obstructive pulmonary disease (COPD) is a serious inflammatory lung disease affecting millions of people around the world.
Previous work has shown that it is possible to distinguish between a pre- and a post-treatment state using automatic analysis of read speech.
We show that the inclusion of sustained vowels can improve performance to up to 79% unweighted average recall, from a 71% baseline using read speech.
arXiv Detail & Related papers (2024-06-10T15:17:17Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Fused Audio Instance and Representation for Respiratory Disease
Detection [0.6827423171182154]
We propose Fused Audio Instance and Representation (FAIR) as a method for respiratory disease detection.
We conducted experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds.
arXiv Detail & Related papers (2022-04-22T09:01:29Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - A Machine Learning Approach for Delineating Similar Sound Symptoms of
Respiratory Conditions on a Smartphone [0.0]
We leverage the improved computational and storage capabilities of modern smartphones to distinguish the respiratory sound symptoms using machine learning algorithms.
The appreciable performance of these algorithms on a mobile phone shows smartphone as an alternate tool for recognition and discrimination of respiratory symptoms in real-time scenarios.
arXiv Detail & Related papers (2021-10-15T07:24:30Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Silent Speech Interfaces for Speech Restoration: A Review [59.68902463890532]
Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders.
SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication.
Most present-day SSIs have only been validated in laboratory settings for healthy users.
arXiv Detail & Related papers (2020-09-04T11:05:50Z) - Respiratory Sound Classification Using Long-Short Term Memory [62.997667081978825]
This paper examines the difficulties that exist when attempting to perform sound classification as it relates to respiratory disease classification.
An examination on the use of deep learning and long short-term memory networks is performed in order to identify how such a task can be implemented.
arXiv Detail & Related papers (2020-08-06T23:11:57Z) - Speaker and Posture Classification using Instantaneous Intraspeech
Breathing Features [2.578242050187029]
We propose a method for speaker and posture classification using intraspeech breathing sounds.
Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained.
arXiv Detail & Related papers (2020-05-25T17:00:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.