Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism
- URL: http://arxiv.org/abs/2507.20052v1
- Date: Sat, 26 Jul 2025 20:29:25 GMT
- Title: Improving Deep Learning-based Respiratory Sound Analysis with Frequency Selection and Attention Mechanism
- Authors: Nouhaila Fraihi, Ouassim Karrakchou, Mounir Ghogho,
- Abstract summary: We propose a compact CNN-Temporal Self-Attention (CNN-TSA) network that integrates lightweight self-attention into an efficient CNN backbone.<n>Central to our approach is a Frequency Band Selection (FBS) module that suppresses noisy and non-informative frequency regions.<n>We also introduce age-specific models to enhance robustness across diverse patient groups.
- Score: 3.1515385358176817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate classification of respiratory sounds requires deep learning models that effectively capture fine-grained acoustic features and long-range temporal dependencies. Convolutional Neural Networks (CNNs) are well-suited for extracting local time-frequency patterns but are limited in modeling global context. In contrast, transformer-based models can capture long-range dependencies, albeit with higher computational demands. To address these limitations, we propose a compact CNN-Temporal Self-Attention (CNN-TSA) network that integrates lightweight self-attention into an efficient CNN backbone. Central to our approach is a Frequency Band Selection (FBS) module that suppresses noisy and non-informative frequency regions, substantially improving accuracy and reducing FLOPs by up to 50%. We also introduce age-specific models to enhance robustness across diverse patient groups. Evaluated on the SPRSound-2022/2023 and ICBHI-2017 lung sound datasets, CNN-TSA with FBS sets new benchmarks on SPRSound and achieves state-of-the-art performance on ICBHI, all with a significantly smaller computational footprint. Furthermore, integrating FBS into an existing transformer baseline yields a new record on ICBHI, confirming FBS as an effective drop-in enhancement. These results demonstrate that our framework enables reliable, real-time respiratory sound analysis suitable for deployment in resource-constrained settings.
Related papers
- Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons [69.73249913506042]
This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons to process time-domain signals directly.<n>By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity.<n> Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs.
arXiv Detail & Related papers (2025-06-24T21:14:59Z) - FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution [70.61549422952193]
Face super-resolution (FSR) under limited computational costs remains an open problem.<n>Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources.<n>We propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components.
arXiv Detail & Related papers (2025-06-17T02:33:42Z) - TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting [27.91825785119938]
Spiking Neural Networks (SNNs) offer a promising, biologically inspired approach for processing data for time series forecasting.<n>We introduce the Temporal Leaky Segment Integrate-and-Fire model, featuring a dual-compartment architecture.<n> Experimental results show that TS-LIF outperforms traditional SNNs in time series forecasting.
arXiv Detail & Related papers (2025-03-07T03:06:21Z) - HADL Framework for Noise Resilient Long-Term Time Series Forecasting [0.7810572107832383]
Long-term time series forecasting is critical in domains such as finance, economics, and energy.<n>The impact of temporal noise in extended lookback windows remains underexplored, often degrading model performance and computational efficiency.<n>We propose a novel framework that addresses these challenges by integrating the Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT)<n>Our approach demonstrates competitive robustness to noisy input, significantly reduces computational complexity, and achieves competitive or state-of-the-art forecasting performance across diverse benchmark datasets.
arXiv Detail & Related papers (2025-02-14T21:41:42Z) - Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities.
Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z) - Deep Learning-Based Synchronization for Uplink NB-IoT [72.86843435313048]
We propose a neural network (NN)-based algorithm for device detection and time of arrival (ToA) estimation for the narrowband physical random-access channel (NPRACH) of narrowband internet of things (NB-IoT)
The introduced NN architecture leverages residual convolutional networks as well as knowledge of the preamble structure of the 5G New Radio (5G NR) specifications.
arXiv Detail & Related papers (2022-05-22T12:16:43Z) - Time-Frequency Localization Using Deep Convolutional Maxout Neural
Network in Persian Speech Recognition [0.0]
Time-frequency flexibility in some mammals' auditory neurons system improves recognition performance.
This paper proposes a CNN-based structure for time-frequency localization of audio signal information in the ASR acoustic model.
The average recognition score of TFCMNN models is about 1.6% higher than the average of conventional models.
arXiv Detail & Related papers (2021-08-09T05:46:58Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - Compute and memory efficient universal sound source separation [23.152611264259225]
We provide a family of efficient neural network architectures for general purpose audio source separation.
The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF)
Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-03-03T19:16:53Z) - Encoding Frequency Constraints in Preventive Unit Commitment Using Deep
Learning with Region-of-Interest Active Sampling [8.776029771500689]
This paper presents a generic data-driven framework for frequency-constrained unit commitment (FCUC) under high renewable penetration.
Deep neural networks (DNNs) are trained to predict the frequency response using real data or high-fidelity simulation data.
In the data generation phase, all possible power injections are considered, and a region-of-interests active sampling is proposed to include power injection samples with frequency nadirs closer to the UFLC threshold.
arXiv Detail & Related papers (2021-02-18T19:04:21Z) - Speaker Representation Learning using Global Context Guided Channel and
Time-Frequency Transformations [67.18006078950337]
We use the global context information to enhance important channels and recalibrate salient time-frequency locations.
The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset.
arXiv Detail & Related papers (2020-09-02T01:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.