Deep Feature Learning for Medical Acoustics
- URL: http://arxiv.org/abs/2208.03084v1
- Date: Fri, 5 Aug 2022 10:39:37 GMT
- Title: Deep Feature Learning for Medical Acoustics
- Authors: Alessandro Maria Poir\`e, Federico Simonetta, Stavros Ntalampiras
- Abstract summary: The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
- Score: 78.56998585396421
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The purpose of this paper is to compare different learnable frontends in
medical acoustics tasks. A framework has been implemented to classify human
respiratory sounds and heartbeats in two categories, i.e. healthy or affected
by pathologies. After obtaining two suitable datasets, we proceeded to classify
the sounds using two learnable state-of-art frontends -- LEAF and nnAudio --
plus a non-learnable baseline frontend, i.e. Mel-filterbanks. The computed
features are then fed into two different CNN models, namely VGG16 and
EfficientNet. The frontends are carefully benchmarked in terms of the number of
parameters, computational resources, and effectiveness.
This work demonstrates how the integration of learnable frontends in neural
audio classification systems may improve performance, especially in the field
of medical acoustics. However, the usage of such frameworks makes the needed
amount of data even larger. Consequently, they are useful if the amount of data
available for training is adequately large to assist the feature learning
process.
Related papers
- Towards Open-Vocabulary Audio-Visual Event Localization [59.23161248808759]
We introduce the Open-Vocabulary Audio-Visual Event localization problem.
This problem requires localizing audio-visual events and predicting explicit categories for both seen and unseen data at inference.
We propose the OV-AVEBench dataset, comprising 24,800 videos across 67 real-life audio-visual scenes.
arXiv Detail & Related papers (2024-11-18T04:35:20Z) - Reproducible Machine Learning-based Voice Pathology Detection: Introducing the Pitch Difference Feature [1.1455937444848385]
We propose a robust set of features derived from a thorough research of contemporary practices in voice pathology detection.
We combine this feature set, containing data from the publicly available Saarbr"ucken Voice Database (SVD), with preprocessing using the K-Means Synthetic Minority Over-Sampling Technique algorithm.
Our approach has achieved the state-of-the-art performance, measured by unweighted average recall in voice pathology detection.
arXiv Detail & Related papers (2024-10-14T14:17:52Z) - AFEN: Respiratory Disease Classification using Ensemble Learning [2.524195881002773]
We present AFEN (Audio Feature Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost.
We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification.
We empirically verify that AFEN sets a new state-of-theart using Precision and Recall as metrics, while decreasing training time by 60%.
arXiv Detail & Related papers (2024-05-08T23:50:54Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities.
Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Transferring Voice Knowledge for Acoustic Event Detection: An Empirical
Study [11.825240267691209]
This paper investigates the potential of transferring high-level voice representations extracted from a public speaker dataset to enrich an acoustic event detection pipeline.
We develop a dual-branch neural network architecture for the joint learning of voice and acoustic features during an AED process.
arXiv Detail & Related papers (2021-10-07T04:03:21Z) - Automatic Analysis of the Emotional Content of Speech in Daylong
Child-Centered Recordings from a Neonatal Intensive Care Unit [3.7373314439051106]
Hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia.
We introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data.
We show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall.
arXiv Detail & Related papers (2021-06-14T11:17:52Z) - Effects of Word-frequency based Pre- and Post- Processings for Audio
Captioning [49.41766997393417]
The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.
The system received the highest evaluation scores, but which of the individual elements most fully contributed to its perfor-mance has not yet been clarified.
arXiv Detail & Related papers (2020-09-24T01:07:33Z) - CURE Dataset: Ladder Networks for Audio Event Classification [15.850545634216484]
There are approximately 3M people with hearing loss who can't perceive events happening around them.
This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss.
arXiv Detail & Related papers (2020-01-12T09:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.