Coswara: A respiratory sounds and symptoms dataset for remote screening
of SARS-CoV-2 infection
- URL: http://arxiv.org/abs/2305.12741v1
- Date: Mon, 22 May 2023 06:09:10 GMT
- Title: Coswara: A respiratory sounds and symptoms dataset for remote screening
of SARS-CoV-2 infection
- Authors: Debarpan Bhattacharya, Neeraj Kumar Sharma, Debottam Dutta, Srikanth
Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori,
Suhail K K, Sadhana Gonuguntla, Murali Alagesan
- Abstract summary: This paper presents the Coswara dataset, a dataset containing diverse set of respiratory sounds and rich meta-data.
The respiratory sounds contained nine sound categories associated with variants of breathing, cough and speech.
The paper summarizes the data collection procedure, demographic, symptoms and audio data information.
- Score: 23.789227109218118
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the Coswara dataset, a dataset containing diverse set of
respiratory sounds and rich meta-data, recorded between April-2020 and
February-2022 from 2635 individuals (1819 SARS-CoV-2 negative, 674 positive,
and 142 recovered subjects). The respiratory sounds contained nine sound
categories associated with variants of breathing, cough and speech. The rich
metadata contained demographic information associated with age, gender and
geographic location, as well as the health information relating to the
symptoms, pre-existing respiratory ailments, comorbidity and SARS-CoV-2 test
status. Our study is the first of its kind to manually annotate the audio
quality of the entire dataset (amounting to 65~hours) through manual listening.
The paper summarizes the data collection procedure, demographic, symptoms and
audio data information. A COVID-19 classifier based on bi-directional long
short-term (BLSTM) architecture, is trained and evaluated on the different
population sub-groups contained in the dataset to understand the bias/fairness
of the model. This enabled the analysis of the impact of gender, geographic
location, date of recording, and language proficiency on the COVID-19 detection
performance.
Related papers
- BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification [0.0]
We fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata.
Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%.
arXiv Detail & Related papers (2024-06-10T20:49:54Z) - Audio-based AI classifiers show no evidence of improved COVID-19
screening over simple symptoms checkers [37.085063562292845]
We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata.
Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission survey.
In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy.
However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker.
arXiv Detail & Related papers (2022-12-15T15:44:02Z) - Statistical Design and Analysis for Robust Machine Learning: A Case
Study from COVID-19 [45.216628450147034]
This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals.
We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features.
arXiv Detail & Related papers (2022-12-15T13:50:13Z) - A large-scale and PCR-referenced vocal audio dataset for COVID-19 [29.40538927182366]
The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022.
Audio recordings of influenzaal coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey.
This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma.
arXiv Detail & Related papers (2022-12-15T11:40:40Z) - COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset
featuring the same speakers with and without infection [4.894353840908006]
We introduce the COVYT dataset -- a novel COVID-19 dataset collected from public sources containing more than 8 hours of speech from 65 speakers.
As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers.
arXiv Detail & Related papers (2022-06-20T16:26:51Z) - Evaluating the COVID-19 Identification ResNet (CIdeR) on the INTERSPEECH
COVID-19 from Audio Challenges [59.78485839636553]
CIdeR is an end-to-end deep learning neural network originally designed to classify whether an individual is COVID-positive or COVID-negative.
We demonstrate the potential of CIdeR at binary COVID-19 diagnosis from both the COVID-19 Cough and Speech Sub-Challenges of INTERSPEECH 2021, ComParE and DiCOVA.
arXiv Detail & Related papers (2021-07-30T10:59:08Z) - Sounds of COVID-19: exploring realistic performance of audio-based
digital testing [17.59710651224251]
In this paper, we explore the realistic performance of audio-based digital testing of COVID-19.
We collected a large crowdsourced respiratory audio dataset through a mobile app, alongside recent COVID-19 test result and symptoms intended as a ground truth.
The unbiased model takes features extracted from breathing, coughs, and voice signals as predictors and yields an AUC-ROC of 0.71 (95% CI: 0.65$-$0.77)
arXiv Detail & Related papers (2021-06-29T15:50:36Z) - CoRSAI: A System for Robust Interpretation of CT Scans of COVID-19
Patients Using Deep Learning [133.87426554801252]
We adopted an approach based on using an ensemble of deep convolutionalneural networks for segmentation of lung CT scans.
Using our models we are able to segment the lesions, evaluatepatients dynamics, estimate relative volume of lungs affected by lesions and evaluate the lung damage stage.
arXiv Detail & Related papers (2021-05-25T12:06:55Z) - Quantification of pulmonary involvement in COVID-19 pneumonia by means
of a cascade oftwo U-nets: training and assessment on multipledatasets using
different annotation criteria [83.83783947027392]
This study aims at exploiting Artificial intelligence (AI) for the identification, segmentation and quantification of COVID-19 pulmonary lesions.
We developed an automated analysis pipeline, the LungQuant system, based on a cascade of two U-nets.
The accuracy in predicting the CT-Severity Score (CT-SS) of the LungQuant system has been also evaluated.
arXiv Detail & Related papers (2021-05-06T10:21:28Z) - COVIDx-US -- An open-access benchmark dataset of ultrasound imaging data
for AI-driven COVID-19 analytics [116.6248556979572]
COVIDx-US is an open-access benchmark dataset of COVID-19 related ultrasound imaging data.
It consists of 93 lung ultrasound videos and 10,774 processed images of patients infected with SARS-CoV-2 pneumonia, non-SARS-CoV-2 pneumonia, as well as healthy control cases.
arXiv Detail & Related papers (2021-03-18T03:31:33Z) - Classification supporting COVID-19 diagnostics based on patient survey
data [82.41449972618423]
logistic regression and XGBoost classifiers, that allow for effective screening of patients for COVID-19 were generated.
The obtained classification models provided the basis for the DECODE service (decode.polsl.pl), which can serve as support in screening patients with COVID-19 disease.
This data set consists of more than 3,000 examples is based on questionnaires collected at a hospital in Poland.
arXiv Detail & Related papers (2020-11-24T17:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.