A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification
- URL: http://arxiv.org/abs/2507.20408v1
- Date: Sun, 27 Jul 2025 20:36:46 GMT
- Title: A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification
- Authors: Samiul Based Shuvo, Taufiq Hasan,
- Abstract summary: We propose a hybrid CNN-Transformer framework to classify pediatric respiratory diseases using scalogram images.<n>Our model achieved an overall score of 0.9039 in binary event classifi cation and 0.8448 in multiclass event classification.
- Score: 2.034969618183334
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automated analysis of lung sound auscultation is essential for monitoring respiratory health, especially in regions facing a shortage of skilled healthcare workers. While respiratory sound classification has been widely studied in adults, its ap plication in pediatric populations, particularly in children aged <6 years, remains an underexplored area. The developmental changes in pediatric lungs considerably alter the acoustic proper ties of respiratory sounds, necessitating specialized classification approaches tailored to this age group. To address this, we propose a multistage hybrid CNN-Transformer framework that combines CNN-extracted features with an attention-based architecture to classify pediatric respiratory diseases using scalogram images from both full recordings and individual breath events. Our model achieved an overall score of 0.9039 in binary event classifi cation and 0.8448 in multiclass event classification by employing class-wise focal loss to address data imbalance. At the recording level, the model attained scores of 0.720 for ternary and 0.571 for multiclass classification. These scores outperform the previous best models by 3.81% and 5.94%, respectively. This approach offers a promising solution for scalable pediatric respiratory disease diagnosis, especially in resource-limited settings.
Related papers
- Pediatric Asthma Detection with Googles HeAR Model: An AI-Driven Respiratory Sound Classifier [0.8463972278020965]
This work presents an AI-powered diagnostic pipeline to detect early signs of asthma from pediatric respiratory sounds.<n>The SPRSound dataset is used to extract 2-second audio segments labeled as wheeze, crackle, rhonchi, stridor, or normal.<n>The system achieves over 91% accuracy, with strong performance on precision-recall metrics for positive cases.
arXiv Detail & Related papers (2025-04-28T12:52:17Z) - Respiratory Inhaler Sound Event Classification Using Self-Supervised Learning [43.83039192442981]
Asthma is a chronic respiratory condition that affects millions of people worldwide.<n>We adapted the wav2vec 2.0 self-supervised learning model for inhaler sound classification by pre-training and fine-tuning this model on inhaler sounds.<n>The proposed model shows a balanced accuracy of 98% on a dataset collected using a dry powder inhaler and smartwatch device.
arXiv Detail & Related papers (2025-04-15T14:44:47Z) - Machine learning-based algorithms for at-home respiratory disease monitoring and respiratory assessment [45.104212062055424]
This work aims to develop machine learning-based algorithms to facilitate at-home respiratory disease monitoring and assessment.
Data were collected from 30 healthy adults, encompassing respiratory pressure, flow, and dynamic thoraco-abdominal circumferential measurements.
Various machine learning models, including the random forest classifier, logistic regression, and support vector machine (SVM), were trained to predict breathing types.
arXiv Detail & Related papers (2024-09-05T02:14:31Z) - Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation [1.4149417323913716]
We study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection.
For the regression task of estimating SpO2 levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters.
We transform SpO2-regression into a SpO2-threshold binary classification problem, with a threshold of 92%.
arXiv Detail & Related papers (2024-07-30T17:26:16Z) - Attention-based Saliency Maps Improve Interpretability of Pneumothorax
Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency.
ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData.
ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z) - Validated respiratory drug deposition predictions from 2D and 3D medical
images with statistical shape models and convolutional neural networks [47.187609203210705]
We aim to develop and validate an automated computational framework for patient-specific deposition modelling.
An image processing approach is proposed that could produce 3D patient respiratory geometries from 2D chest X-rays and 3D CT images.
arXiv Detail & Related papers (2023-03-02T07:47:07Z) - Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling.
Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z) - TotalSegmentator: robust segmentation of 104 anatomical structures in CT
images [48.50994220135258]
We present a deep learning segmentation model for body CT images.
The model can segment 104 anatomical structures relevant for use cases such as organ volumetry, disease characterization, and surgical or radiotherapy planning.
arXiv Detail & Related papers (2022-08-11T15:16:40Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - Osteoporosis Prescreening using Panoramic Radiographs through a Deep
Convolutional Neural Network with Attention Mechanism [65.70943212672023]
Deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs.
dataset of 70 panoramic radiographs (PRs) from 70 different subjects of age between 49 to 60 was used.
arXiv Detail & Related papers (2021-10-19T00:03:57Z) - Medulloblastoma Tumor Classification using Deep Transfer Learning with
Multi-Scale EfficientNets [63.62764375279861]
We propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions.
Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements.
arXiv Detail & Related papers (2021-09-10T13:07:11Z) - Learning to Automatically Diagnose Multiple Diseases in Pediatric Chest
Radiographs Using Deep Convolutional Neural Networks [0.4697611383288171]
Deep convolutional neural networks (D-CNNs) have shown remarkable performance in interpreting chest radiograph (CXR) scans in adults.
In this paper, we retrospectively collect a large dataset of 5,017 pediatric CXR scans, for which each is manually labeled by an experienced radiologist.
A D-CNN model is then trained on 3,550 annotated scans to classify multiple pediatric lung pathologies automatically.
arXiv Detail & Related papers (2021-08-14T08:14:52Z) - Deep Neural Network Based Respiratory Pathology Classification Using
Cough Sounds [6.376404422444008]
We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs.
We collected a new dataset of cough sounds, labelled with clinician's diagnosis.
arXiv Detail & Related papers (2021-06-23T05:49:20Z) - The Diagnosis of Asthma using Hilbert-Huang Transform and Deep Learning
on Lung Sounds [2.294014185517203]
The statistical features are calculated from intrinsic mode functions that are extracted by applying the Hilbert Transform to the lung sounds.
The classification of the lung sounds from asthma and healthy subjects is performed using Deep Belief Networks (DBN)
arXiv Detail & Related papers (2021-01-20T19:04:33Z) - Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural
Networks [68.8204255655161]
We adapt an ensemble of Convolutional Neural Networks to classify if a speaker is infected with COVID-19 or not.
Ultimately, it achieves an Unweighted Average Recall (UAR) of 74.9%, or an Area Under ROC Curve (AUC) of 80.7% by ensembling neural networks.
arXiv Detail & Related papers (2020-12-29T01:14:17Z) - 1-D Convlutional Neural Networks for the Analysis of Pupil Size
Variations in Scotopic Conditions [79.71065005161566]
1-D convolutional neural network models are trained for classification of short-range sequences.
Model provides prediction with high average accuracy on a hold out test set.
arXiv Detail & Related papers (2020-02-06T17:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.