Audio MFCC-gram Transformers for respiratory insufficiency detection in
COVID-19
- URL: http://arxiv.org/abs/2210.14085v1
- Date: Tue, 25 Oct 2022 15:11:40 GMT
- Title: Audio MFCC-gram Transformers for respiratory insufficiency detection in
COVID-19
- Authors: Marcelo Matheus Gauy and Marcelo Finger
- Abstract summary: This work explores speech as a biomarker and investigates the detection of respiratory insufficiency (RI) by analyzing speech samples.
Previous work constructed a dataset of respiratory insufficiency COVID-19 patient utterances and analyzed it by means of a convolutional neural network.
Here, we study how Transformer neural network architectures can improve the performance on RI detection.
- Score: 3.6042575355093907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work explores speech as a biomarker and investigates the detection of
respiratory insufficiency (RI) by analyzing speech samples. Previous work
\cite{spira2021} constructed a dataset of respiratory insufficiency COVID-19
patient utterances and analyzed it by means of a convolutional neural network
achieving an accuracy of $87.04\%$, validating the hypothesis that one can
detect RI through speech. Here, we study how Transformer neural network
architectures can improve the performance on RI detection. This approach
enables construction of an acoustic model. By choosing the correct pretraining
technique, we generate a self-supervised acoustic model, leading to improved
performance ($96.53\%$) of Transformers for RI detection.
Related papers
- Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases [5.810320353233697]
We introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition.
Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds.
We have developed a real-time respiratory sound discrimination system utilizing the Rene architecture.
arXiv Detail & Related papers (2024-05-13T03:00:28Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Image Synthesis with Disentangled Attributes for Chest X-Ray Nodule
Augmentation and Detection [52.93342510469636]
Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers.
Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR.
To alleviate the limited availability of such datasets, lung nodule synthesis methods are proposed for the sake of data augmentation.
arXiv Detail & Related papers (2022-07-19T16:38:48Z) - Interpretable Acoustic Representation Learning on Breathing and Speech
Signals for COVID-19 Detection [37.01066509527848]
We describe an approach for representation learning of audio signals for the task of COVID-19 detection.
The raw audio samples are processed with a bank of 1-D convolutional filters that are parameterized as cosine modulated Gaussian functions.
The filtered outputs are pooled, log-compressed and used in a self-attention based relevance weighting mechanism.
arXiv Detail & Related papers (2022-06-27T15:20:51Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - EIHW-MTG DiCOVA 2021 Challenge System Report [2.3544007354006706]
This paper aims to automatically detect COVID-19 patients by analysing the acoustic information embedded in coughs.
We focus on analysing the spectrogram representations of coughing samples with the aim to investigate whether COVID-19 alters the frequency content of these signals.
arXiv Detail & Related papers (2021-10-13T07:38:54Z) - An Approach Towards Physics Informed Lung Ultrasound Image Scoring
Neural Network for Diagnostic Assistance in COVID-19 [0.0]
A novel approach is presented to extract acoustic propagation-based features to highlight the region below pleura in lung ultrasound (LUS)
A neural network, referred to as LUSNet, is trained to classify the LUS images into five classes of varying severity of lung infection to track the progression of COVID-19.
A detailed analysis of the proposed approach on LUS images over the infection to full recovery period of ten confirmed COVID-19 subjects shows an average five-fold cross-validation accuracy, sensitivity, and specificity of 97%, 93%, and 98% respectively over 5000 frames of COVID-19 videos.
arXiv Detail & Related papers (2021-06-13T13:01:53Z) - Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural
Networks [68.8204255655161]
We adapt an ensemble of Convolutional Neural Networks to classify if a speaker is infected with COVID-19 or not.
Ultimately, it achieves an Unweighted Average Recall (UAR) of 74.9%, or an Area Under ROC Curve (AUC) of 80.7% by ensembling neural networks.
arXiv Detail & Related papers (2020-12-29T01:14:17Z) - Cross-domain Adaptation with Discrepancy Minimization for
Text-independent Forensic Speaker Verification [61.54074498090374]
This study introduces a CRSS-Forensics audio dataset collected in multiple acoustic environments.
We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics.
arXiv Detail & Related papers (2020-09-05T02:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.