Making deep neural networks work for medical audio: representation, compression and domain adaptation
- URL: http://arxiv.org/abs/2506.13970v1
- Date: Sat, 24 May 2025 20:32:31 GMT
- Title: Making deep neural networks work for medical audio: representation, compression and domain adaptation
- Authors: Charles C Onu,
- Abstract summary: This thesis addresses the technical challenges of applying machine learning to understand and interpret medical audio signals.<n>We focus on the analysis of infant cry sounds to predict medical conditions.<n>To advance research in this domain, we release a unique, open-source dataset of infant cry sounds.
- Score: 1.1059341532498634
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This thesis addresses the technical challenges of applying machine learning to understand and interpret medical audio signals. The sounds of our lungs, heart, and voice convey vital information about our health. Yet, in contemporary medicine, these sounds are primarily analyzed through auditory interpretation by experts using devices like stethoscopes. Automated analysis offers the potential to standardize the processing of medical sounds, enable screening in low-resource settings where physicians are scarce, and detect subtle patterns that may elude human perception, thereby facilitating early diagnosis and treatment. Focusing on the analysis of infant cry sounds to predict medical conditions, this thesis contributes on four key fronts. First, in low-data settings, we demonstrate that large databases of adult speech can be harnessed through neural transfer learning to develop more accurate and robust models for infant cry analysis. Second, in cost-effective modeling, we introduce an end-to-end model compression approach for recurrent networks using tensor decomposition. Our method requires no post-hoc processing, achieves compression rates of several hundred-fold, and delivers accurate, portable models suitable for resource-constrained devices. Third, we propose novel domain adaptation techniques tailored for audio models and adapt existing methods from computer vision. These approaches address dataset bias and enhance generalization across domains while maintaining strong performance on the original data. Finally, to advance research in this domain, we release a unique, open-source dataset of infant cry sounds, developed in collaboration with clinicians worldwide. This work lays the foundation for recognizing the infant cry as a vital sign and highlights the transformative potential of AI-driven audio monitoring in shaping the future of accessible and affordable healthcare.
Related papers
- Determining Fetal Orientations From Blind Sweep Ultrasound Video [1.3456699275044242]
The work distinguishes itself by introducing automated fetal lie prediction and by proposing an assistive paradigm that augments sonographer expertise rather than replacing it.<n>Future research will focus on enhancing acquisition efficiency, and exploring real-time clinical integration to improve workflow and support for obstetric clinicians.
arXiv Detail & Related papers (2025-04-09T12:51:15Z) - Detecting abnormal heart sound using mobile phones and on-device IConNet [0.0]
We present a user-friendly solution for abnormal heart sound detection, utilizing mobile phones and a lightweight neural network optimized for on-device inference.<n>IConNet, an Interpretable Convolutional Neural Network, harnesses insights from audio signal processing, enhancing efficiency and providing transparency in neural pattern extraction from raw waveform signals.
arXiv Detail & Related papers (2024-12-04T12:18:21Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Voice EHR: Introducing Multimodal Audio Data for Health [3.8090294667599927]
Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries.
This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application.
arXiv Detail & Related papers (2024-04-02T04:07:22Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - HEAR4Health: A blueprint for making computer audition a staple of modern
healthcare [89.8799665638295]
Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems.
Computer audition can be seen to be lagging behind, at least in terms of commercial interest.
We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data.
arXiv Detail & Related papers (2023-01-25T09:25:08Z) - Ultrasound Signal Processing: From Models to Deep Learning [64.56774869055826]
Medical ultrasound imaging relies heavily on high-quality signal processing to provide reliable and interpretable image reconstructions.
Deep learning based methods, which are optimized in a data-driven fashion, have gained popularity.
A relatively new paradigm combines the power of the two: leveraging data-driven deep learning, as well as exploiting domain knowledge.
arXiv Detail & Related papers (2022-04-09T13:04:36Z) - Convolutional Neural Network-Based Age Estimation Using B-Mode
Ultrasound Tongue Image [10.100437437151621]
We explore the feasibility of age estimation using the ultrasound tongue image of the speakers.
Motivated by the success of deep learning, this paper leverages deep learning on this task.
The developed method can be used a tool to evaluate the performance of speech therapy sessions.
arXiv Detail & Related papers (2021-01-27T08:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.