COVID-19 Patient Detection from Telephone Quality Speech Data
- URL: http://arxiv.org/abs/2011.04299v1
- Date: Mon, 9 Nov 2020 10:16:08 GMT
- Title: COVID-19 Patient Detection from Telephone Quality Speech Data
- Authors: Kotra Venkata Sai Ritwik, Shareef Babu Kalluri, Deepu Vijayasenan
- Abstract summary: We try to investigate the presence of cues about the COVID-19 disease in the speech data.
An SVM classifier on this dataset is able to achieve an accuracy of 88.6% and an F1-Score of 92.7%.
Some phone classes, such as nasals, stops, and mid vowels can distinguish the two classes better than the others.
- Score: 4.726777092009554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we try to investigate the presence of cues about the COVID-19
disease in the speech data. We use an approach that is similar to speaker
recognition. Each sentence is represented as super vectors of short term Mel
filter bank features for each phoneme. These features are used to learn a
two-class classifier to separate the COVID-19 speech from normal. Experiments
on a small dataset collected from YouTube videos show that an SVM classifier on
this dataset is able to achieve an accuracy of 88.6% and an F1-Score of 92.7%.
Further investigation reveals that some phone classes, such as nasals, stops,
and mid vowels can distinguish the two classes better than the others.
Related papers
- SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset
featuring the same speakers with and without infection [4.894353840908006]
We introduce the COVYT dataset -- a novel COVID-19 dataset collected from public sources containing more than 8 hours of speech from 65 speakers.
As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers.
arXiv Detail & Related papers (2022-06-20T16:26:51Z) - Speaker Identification using Speech Recognition [0.0]
This research provides a mechanism for identifying a speaker in an audio file, based on the human voice biometric features like pitch, amplitude, frequency etc.
We proposed an unsupervised learning model where the model can learn speech representation with limited dataset.
arXiv Detail & Related papers (2022-05-29T13:03:42Z) - On the pragmatism of using binary classifiers over data intensive neural
network classifiers for detection of COVID-19 from voice [34.553128768223615]
We show that detecting COVID-19 from voice does not require custom-made non-standard features or complicated neural network classifiers.
We demonstrate this from a human-curated dataset collected and calibrated in clinical settings.
arXiv Detail & Related papers (2022-04-11T00:19:14Z) - Sub-word Level Lip Reading With Visual Attention [88.89348882036512]
We focus on the unique challenges encountered in lip reading and propose tailored solutions.
We obtain state-of-the-art results on the challenging LRS2 and LRS3 benchmarks when training on public datasets.
Our best model achieves 22.6% word error rate on the LRS2 dataset, a performance unprecedented for lip reading models.
arXiv Detail & Related papers (2021-10-14T17:59:57Z) - Machine Learning based COVID-19 Detection from Smartphone Recordings:
Cough, Breath and Speech [7.908757488948712]
We present an experimental investigation into the automatic detection of COVID-19 from smartphone recordings of coughs, breaths and speech.
We base our experiments on two datasets, Coswara and ComParE, containing recordings of coughing, breathing and speech.
We conclude that among all vocal audio, coughs carry the strongest COVID-19 signature followed by breath and speech.
arXiv Detail & Related papers (2021-04-02T23:21:24Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Detecting Parkinson's Disease From an Online Speech-task [4.968576908394359]
In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD)
We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) from all over the US and beyond.
We extracted both standard acoustic features (MFCC), jitter and shimmer variants, and deep learning based features from the speech data.
Our model performed equally well on data collected in controlled lab environment as well as 'in the wild'
arXiv Detail & Related papers (2020-09-02T21:16:24Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z) - wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
Representations [51.25118580050847]
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods.
wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned.
arXiv Detail & Related papers (2020-06-20T02:35:02Z) - Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio
Representation [51.37980448183019]
We propose Audio ALBERT, a lite version of the self-supervised speech representation model.
We show that Audio ALBERT is capable of achieving competitive performance with those huge models in the downstream tasks.
In probing experiments, we find that the latent representations encode richer information of both phoneme and speaker than that of the last layer.
arXiv Detail & Related papers (2020-05-18T10:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.