RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
- URL: http://arxiv.org/abs/2405.02996v1
- Date: Sun, 5 May 2024 16:45:46 GMT
- Title: RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
- Authors: June-Woo Kim, Miika Toikkanen, Sangmin Bae, Minseok Kim, Ho-Young Jung,
- Abstract summary: This paper explores the efficacy of pretrained speech models for respiratory sound classification.
We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential.
We propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment.
- Score: 2.812716452984433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.
Related papers
- Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking [27.708473070563013]
Respiratory audio has predictive power for a wide range of healthcare applications, yet is currently under-explored.
We introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system.
arXiv Detail & Related papers (2024-06-23T16:04:26Z) - Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases [5.810320353233697]
We introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition.
Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds.
We have developed a real-time respiratory sound discrimination system utilizing the Rene architecture.
arXiv Detail & Related papers (2024-05-13T03:00:28Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Adversarial Fine-tuning using Generated Respiratory Sound to Address
Class Imbalance [1.3686993145787067]
We propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder.
We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance.
arXiv Detail & Related papers (2023-11-11T05:02:54Z) - Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on
Respiratory Sound Classification [19.180927437627282]
We introduce a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space.
Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%.
arXiv Detail & Related papers (2023-05-23T13:04:07Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Utilizing Self-supervised Representations for MOS Prediction [51.09985767946843]
Existing evaluations usually require clean references or parallel ground truth data.
Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception.
We develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data.
arXiv Detail & Related papers (2021-04-07T09:44:36Z) - Robust Deep Learning Framework For Predicting Respiratory Anomalies and
Diseases [26.786743524562322]
This paper presents a robust deep learning framework developed to detect respiratory diseases from recordings of respiratory sounds.
A back-end deep learning model classifies the features into classes of respiratory disease or anomaly.
Experiments, conducted over the ICBHI benchmark dataset of respiratory sounds, evaluate the ability of the framework to classify sounds.
arXiv Detail & Related papers (2020-01-21T15:26:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.