Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech
- URL: http://arxiv.org/abs/2407.13035v1
- Date: Wed, 17 Jul 2024 21:57:18 GMT
- Title: Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech
- Authors: Vikramjit Mitra, Anirban Chatterjee, Ke Zhai, Helen Weng, Ayuko Hill, Nicole Hay, Christopher Webb, Jamie Cheng, Erdrin Azemi,
- Abstract summary: Respiratory rate (RR) is a vital metric that is used to assess the overall health, fitness, and general well-being of an individual.
Existing approaches to measure RR are performed using specialized equipment or training.
Studies have demonstrated that machine learning algorithms can be used to estimate RR using bio-sensor signals as input.
- Score: 2.935056044470713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The process of human speech production involves coordinated respiratory action to elicit acoustic speech signals. Typically, speech is produced when air is forced from the lungs and is modulated by the vocal tract, where such actions are interspersed by moments of breathing in air (inhalation) to refill the lungs again. Respiratory rate (RR) is a vital metric that is used to assess the overall health, fitness, and general well-being of an individual. Existing approaches to measure RR (number of breaths one takes in a minute) are performed using specialized equipment or training. Studies have demonstrated that machine learning algorithms can be used to estimate RR using bio-sensor signals as input. Speech-based estimation of RR can offer an effective approach to measure the vital metric without requiring any specialized equipment or sensors. This work investigates a machine learning based approach to estimate RR from speech segments obtained from subjects speaking to a close-talking microphone device. Data were collected from N=26 individuals, where the groundtruth RR was obtained through commercial grade chest-belts and then manually corrected for any errors. A convolutional long-short term memory network (Conv-LSTM) is proposed to estimate respiration time-series data from the speech signal. We demonstrate that the use of pre-trained representations obtained from a foundation model, such as Wav2Vec2, can be used to estimate respiration-time-series with low root-mean-squared error and high correlation coefficient, when compared with the baseline. The model-driven time series can be used to estimate $RR$ with a low mean absolute error (MAE) ~ 1.6 breaths/min.
Related papers
- Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases [5.810320353233697]
We introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition.
Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds.
We have developed a real-time respiratory sound discrimination system utilizing the Rene architecture.
arXiv Detail & Related papers (2024-05-13T03:00:28Z) - SMRD: SURE-based Robust MRI Reconstruction with Diffusion Models [76.43625653814911]
Diffusion models have gained popularity for accelerated MRI reconstruction due to their high sample quality.
They can effectively serve as rich data priors while incorporating the forward model flexibly at inference time.
We introduce SURE-based MRI Reconstruction with Diffusion models (SMRD) to enhance robustness during testing.
arXiv Detail & Related papers (2023-10-03T05:05:35Z) - Exploring Speech Recognition, Translation, and Understanding with
Discrete Speech Units: A Comparative Study [68.88536866933038]
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies.
Recent investigations proposed the use of discrete speech units derived from self-supervised learning representations.
Applying various methods, such as de-duplication and subword modeling, can further compress the speech sequence length.
arXiv Detail & Related papers (2023-09-27T17:21:13Z) - Automatically measuring speech fluency in people with aphasia: first
achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z) - Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and
Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the
Human Connectome Development Cohort [55.41644538483948]
This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT.
Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries.
arXiv Detail & Related papers (2023-07-03T18:06:36Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - A Deep Learning Based Multitask Network for Respiration Rate Estimation
-- A Practical Perspective [1.290382979353427]
This paper presents a multitasking architecture based on Deep Learning (DL) for estimating instantaneous and average respiration rate from ECG and accelerometer signals.
The proposed model showed better overall accuracy and gave better results than individual modalities during different activities.
arXiv Detail & Related papers (2021-12-13T11:33:42Z) - Estimating Respiratory Rate From Breath Audio Obtained Through Wearable
Microphones [6.293929325572208]
Respiratory rate (RR) is a clinical metric used to assess overall health and physical fitness.
This work investigates a model-driven approach to estimate RR from short audio segments obtained after physical exertion in healthy adults.
arXiv Detail & Related papers (2021-07-28T17:24:44Z) - A Novel Non-Invasive Estimation of Respiration Rate from
Photoplethysmograph Signal Using Machine Learning Model [0.0]
Respiration rate (RR) is a vital indicator of the wellness of a patient.
Real-time continuous RR monitoring facility is only available at the intensive care unit (ICU)
Recent researches have proposed Photoplethysmogram (ECG) and/ Electrocardiogram (ECG) signals for RR estimation.
This paper describes a novel approach to RR estimation using machine learning (ML) models with the PPG signal features.
arXiv Detail & Related papers (2021-02-18T17:08:50Z) - Multispectral Video Fusion for Non-contact Monitoring of Respiratory
Rate and Apnea [7.300192965401497]
Non-contact monitoring of respiration can be achieved with near- and far-infrared spectrum cameras.
We present a novel algorithm based on multispectral data fusion that aims at estimating respiratory rate (RR) during apnea.
Our findings may represent a step towards the use of cameras for vital sign monitoring in medical applications.
arXiv Detail & Related papers (2020-04-21T09:07:09Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.