Related papers: Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report

URL: http://arxiv.org/abs/2511.14939v1
Date: Tue, 18 Nov 2025 21:54:20 GMT
Title: Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
Authors: Daniel Oliveira de Brito, Letícia Gabriella de Souza, Marcelo Matheus Gauy, Marcelo Finger, Arnaldo Candido Junior,
Abstract summary: This report investigates the performance of pre-trained audio models on COVID-19 detection tasks using established benchmark datasets.<n>We implement a strict demographic stratification by age and gender to prevent models from exploiting spurious correlations between demographic characteristics and COVID-19 status.
Score: 0.9431368999053936
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This technical report investigates the performance of pre-trained audio models on COVID-19 detection tasks using established benchmark datasets. We fine-tuned Audio-MAE and three PANN architectures (CNN6, CNN10, CNN14) on the Coswara and COUGHVID datasets, evaluating both intra-dataset and cross-dataset generalization. We implemented a strict demographic stratification by age and gender to prevent models from exploiting spurious correlations between demographic characteristics and COVID-19 status. Intra-dataset results showed moderate performance, with Audio-MAE achieving the strongest result on Coswara (0.82 AUC, 0.76 F1-score), while all models demonstrated limited performance on Coughvid (AUC 0.58-0.63). Cross-dataset evaluation revealed severe generalization failure across all models (AUC 0.43-0.68), with Audio-MAE showing strong performance degradation (F1-score 0.00-0.08). Our experiments demonstrate that demographic balancing, while reducing apparent model performance, provides more realistic assessment of COVID-19 detection capabilities by eliminating demographic leakage - a confounding factor that inflate performance metrics. Additionally, the limited dataset sizes after balancing (1,219-2,160 samples) proved insufficient for deep learning models that typically require substantially larger training sets. These findings highlight fundamental challenges in developing generalizable audio-based COVID-19 detection systems and underscore the importance of rigorous demographic controls for clinically robust model evaluation.

Related papers

Sustaining model performance for covid-19 detection from dynamic audio data: Development and evaluation of a comprehensive drift-adaptive framework [0.5679775668038152]
The COVID-19 pandemic has highlighted the need for robust diagnostic tools capable of detecting the disease from diverse and evolving data sources. The dynamic nature of real-world data can lead to model drift, where performance degrades over time as the underlying data distribution changes. This study aims to develop a framework that monitors model drift and employs adaptation mechanisms to mitigate performance fluctuations.
arXiv Detail & Related papers (2024-09-28T10:06:30Z)
Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Symptom-based Machine Learning Models for the Early Detection of COVID-19: A Narrative Review [0.0]
Machine learning models can analyze large datasets, incorporating patient-reported symptoms, clinical data, and medical imaging. In this paper, we provide an overview of the landscape of symptoms-only machine learning models for predicting COVID-19, including their performance and limitations. The review will also examine the performance of symptom-based models when compared to image-based models.
arXiv Detail & Related papers (2023-12-08T01:41:42Z)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z)
Improving the Robustness of Summarization Models by Detecting and Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes. We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z)
Developing a multi-variate prediction model for the detection of COVID-19 from Crowd-sourced Respiratory Voice Data [0.0]
The novelty of this work is in the development of a deep learning model for the identification of COVID-19 patients from voice recordings. We used the Cambridge University dataset consisting of 893 audio samples, crowd-sourced from 4352 participants that used a COVID-19 Sounds app. Based on the voice data, we developed deep learning classification models to detect positive COVID-19 cases.
arXiv Detail & Related papers (2022-09-08T11:46:37Z)
Sounds of COVID-19: exploring realistic performance of audio-based digital testing [17.59710651224251]
In this paper, we explore the realistic performance of audio-based digital testing of COVID-19. We collected a large crowdsourced respiratory audio dataset through a mobile app, alongside recent COVID-19 test result and symptoms intended as a ground truth. The unbiased model takes features extracted from breathing, coughs, and voice signals as predictors and yields an AUC-ROC of 0.71 (95% CI: 0.65$-$0.77)
arXiv Detail & Related papers (2021-06-29T15:50:36Z)
Systematic investigation into generalization of COVID-19 CT deep learning models with Gabor ensemble for lung involvement scoring [9.94980188821453]
This study investigates the generalizability of key published models using the publicly available COVID-19 Computed Tomography data. We then assess the predictive ability of these models for COVID-19 severity using an independent new dataset.
arXiv Detail & Related papers (2021-04-20T03:49:48Z)
Deep learning-based COVID-19 pneumonia classification using chest CT images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries. We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z)
End-2-End COVID-19 Detection from Breath & Cough Audio [68.41471917650571]
We demonstrate the first attempt to diagnose COVID-19 using end-to-end deep learning from a crowd-sourced dataset of audio samples. We introduce a novel modelling strategy using a custom deep neural network to diagnose COVID-19 from a joint breath and cough representation.
arXiv Detail & Related papers (2021-01-07T01:13:00Z)
From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.