Related papers: Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals

Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals

URL: http://arxiv.org/abs/2512.00563v1
Date: Sat, 29 Nov 2025 17:15:58 GMT
Title: Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals
Authors: S M Asiful Islam Saky, Md Rashidul Islam, Md Saiful Arefin, Shahaba Alam,
Abstract summary: This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals.<n>The framework incorporates Grad-CAM, Integrated Gradients, and SHAP, generating interpretable spectral, temporal, and feature-level explanations.<n>The findings demonstrate the framework's potential for telemedicine, point-of-care diagnostics, and real-world respiratory screening.
Score: 0.49581497240446293
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Respiratory diseases remain major global health challenges, and traditional auscultation is often limited by subjectivity, environmental noise, and inter-clinician variability. This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals. The proposed system integrates two complementary representations: a spectral-temporal encoder based on a CNN-BiLSTM Attention architecture, and a handcrafted acoustic-feature encoder capturing physiologically meaningful descriptors such as MFCCs, spectral centroid, spectral bandwidth, and zero-crossing rate. These branches are combined through late-stage fusion to leverage both data-driven learning and domain-informed acoustic cues. The model is trained and evaluated on the Asthma Detection Dataset Version 2 using rigorous preprocessing, including resampling, normalization, noise filtering, data augmentation, and patient-level stratified partitioning. The study achieved strong generalization with 91.21% accuracy, 0.899 macro F1-score, and 0.9866 macro ROC-AUC, outperforming all ablated variants. An ablation study confirms the importance of temporal modeling, attention mechanisms, and multimodal fusion. The framework incorporates Grad-CAM, Integrated Gradients, and SHAP, generating interpretable spectral, temporal, and feature-level explanations aligned with known acoustic biomarkers to build clinical transparency. The findings demonstrate the framework's potential for telemedicine, point-of-care diagnostics, and real-world respiratory screening.

Related papers

Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis [14.922065513695294]
Resp-Agent is an autonomous multimodal system orchestrated by a novel Active Adrial Curriculum Agent (Thinker-A$2$CA)<n>To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention.<n>To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection.
arXiv Detail & Related papers (2026-02-16T14:48:24Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z)
A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z)
Structure-Accurate Medical Image Translation via Dynamic Frequency Balance and Knowledge Guidance [60.33892654669606]
Diffusion model is a powerful strategy to synthesize the required medical images.<n>Existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information.<n>We propose a novel method based on dynamic frequency balance and knowledge guidance.
arXiv Detail & Related papers (2025-04-13T05:48:13Z)
Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN [7.136933021609078]
This study develops and evaluates novel deep learning architectures that offer fast, accurate, and cost-effective methods for automatic diagnosis of cardiac diseases.<n>We propose two innovative methodologies: first, a Multi-Branch Deep Convolutional Neural Network (MBDCN) that emulates human auditory processing by utilizing diverse convolutional filter sizes and power spectrum input for enhanced feature extraction.<n>Second, a Long Short-Term Memory-Convolutional Neural (LSCN) model that integrates LSTM blocks with MBDCN to improve time-domain feature extraction.
arXiv Detail & Related papers (2024-07-15T13:02:54Z)
Respiratory Disease Classification and Biometric Analysis Using Biosignals from Digital Stethoscopes [3.2458203725405976]
This work presents a novel approach leveraging digital stethoscope technology for automatic respiratory disease classification and biometric analysis. By leveraging one of the largest publicly available medical database of respiratory sounds, we train machine learning models to classify various respiratory health conditions. Our approach achieves high accuracy in both binary classification (89% balanced accuracy for healthy vs. diseased) and multi-class classification (72% balanced accuracy for specific diseases like pneumonia and COPD)
arXiv Detail & Related papers (2023-09-12T23:54:00Z)
Brain Imaging-to-Graph Generation using Adversarial Hierarchical Diffusion Models for MCI Causality Analysis [44.45598796591008]
Brain imaging-to-graph generation (BIGG) framework is proposed to map functional magnetic resonance imaging (fMRI) into effective connectivity for mild cognitive impairment analysis. The hierarchical transformers in the generator are designed to estimate the noise at multiple scales. Evaluations of the ADNI dataset demonstrate the feasibility and efficacy of the proposed model.
arXiv Detail & Related papers (2023-05-18T06:54:56Z)
Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation [67.19443246236048]
Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases. Some small-sized airway branches (e.g., bronchus and terminaloles) significantly aggravate the difficulty of automatic segmentation. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function.
arXiv Detail & Related papers (2022-09-05T16:38:13Z)
Factored Attention and Embedding for Unstructured-view Topic-related Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation. The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z)
CNN-MoE based framework for classification of respiratory anomalies and lung disease detection [33.45087488971683]
This paper presents and explores a robust deep learning framework for auscultation analysis. It aims to classify anomalies in respiratory cycles and detect disease, from respiratory sound recordings.
arXiv Detail & Related papers (2020-04-04T21:45:06Z)
Robust Deep Learning Framework For Predicting Respiratory Anomalies and Diseases [26.786743524562322]
This paper presents a robust deep learning framework developed to detect respiratory diseases from recordings of respiratory sounds. A back-end deep learning model classifies the features into classes of respiratory disease or anomaly. Experiments, conducted over the ICBHI benchmark dataset of respiratory sounds, evaluate the ability of the framework to classify sounds.
arXiv Detail & Related papers (2020-01-21T15:26:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.