Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
- URL: http://arxiv.org/abs/2402.10100v3
- Date: Fri, 5 Apr 2024 21:40:33 GMT
- Title: Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
- Authors: Hamza Mahdi, Eptehal Nashnoush, Rami Saab, Arjun Balachandar, Rishit Dagli, Lucas X. Perri, Houman Khosravani,
- Abstract summary: This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets.
We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, SWIN, and AST.
Our method highlights the benefits of pre-training on large datasets before fine-tuning on specific clinical data.
- Score: 3.0113849517062303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, SWIN, and AST, and compare them against pre-trained audio models such as YAMNet and VGGish. Our method highlights the benefits of pre-training on large datasets before fine-tuning on specific clinical data. We prospectively collected two first-of-their-kind patient audio datasets from stroke patients. We investigated various preprocessing techniques, finding that RGB and grayscale spectrogram transformations affect model performance differently based on the priors they learn from pre-training. Our findings indicate CNNs can match or exceed transformer models in small dataset contexts, with DenseNet-Contrastive and AST models showing notable performance. This study highlights the significance of incremental marginal gains through model selection, pre-training, and preprocessing in sound classification; this offers valuable insights for clinical diagnostics that rely on audio classification.
Related papers
- BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification [0.0]
We fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata.
Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%.
arXiv Detail & Related papers (2024-06-10T20:49:54Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Improved Techniques for the Conditional Generative Augmentation of
Clinical Audio Data [36.45569352490318]
We propose a conditional generative adversarial neural network-based augmentation method which is able to synthesize mel spectrograms from a learned data distribution.
We show that our method outperforms all classical audio augmentation techniques and previously published generative methods in terms of generated sample quality.
The proposed model advances the state-of-the-art in the augmentation of clinical audio data and improves the data bottleneck for the design of clinical acoustic sensing systems.
arXiv Detail & Related papers (2022-11-05T10:58:04Z) - Side-aware Meta-Learning for Cross-Dataset Listener Diagnosis with
Subjective Tinnitus [38.66127142638335]
This paper proposes a side-aware meta-learning for cross-dataset tinnitus diagnosis.
Our method achieves a high accuracy of 73.8% in the cross-dataset classification.
arXiv Detail & Related papers (2022-05-03T03:17:44Z) - Conditional Generative Data Augmentation for Clinical Audio Datasets [36.45569352490318]
We propose a novel data augmentation method for clinical audio datasets based on a conditional Wasserstein Generative Adversarial Network with Gradient Penalty.
To validate our method, we created a clinical audio dataset which was recorded in a real-world operating room during Total Hipplasty (THA) procedures.
We show that training with the generated augmented samples outperforms classical audio augmentation methods in terms of classification accuracy.
arXiv Detail & Related papers (2022-03-22T09:47:31Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - Robustly Pre-trained Neural Model for Direct Temporal Relation
Extraction [10.832917897850361]
We studied several variants of BERT (Bidirectional Representations using Transformers)
We evaluated these methods using a direct temporal relations dataset which is a semantically focused subset of the 2012 i2b2 temporal relations challenge dataset.
Results: RoBERTa, which employs better pre-training strategies including using 10x larger corpus, has improved overall F measure by 0.0864 absolute score (on the 1.00 scale) and thus reducing the error rate by 24% relative to the previous state-of-the-art performance achieved with an SVM (support vector machine) model.
arXiv Detail & Related papers (2020-04-13T22:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.