Related papers: Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers

URL: http://arxiv.org/abs/2512.22564v1
Date: Sat, 27 Dec 2025 11:39:36 GMT
Title: Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers
Authors: Atakan Işık, Selin Vulga Işık, Ahmet Feridun Işık, Mahşuk Taylan,
Abstract summary: We introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM)<n>Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid baselines.<n>Further analysis using t-SNE and attention maps confirms that the model learns robust, discriminative features rather than memorizing background noise.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Respiratory sound classification is hindered by the limited size, high noise levels, and severe class imbalance of benchmark datasets like ICBHI 2017. While Transformer-based models offer powerful feature extraction capabilities, they are prone to overfitting and often converge to sharp minima in the loss landscape when trained on such constrained medical data. To address this, we introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM). Instead of merely minimizing the training loss, our approach optimizes the geometry of the loss surface, guiding the model toward flatter minima that generalize better to unseen patients. We also implement a weighted sampling strategy to handle class imbalance effectively. Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid baselines. More importantly, it reaches a sensitivity of 68.31%, a crucial improvement for reliable clinical screening. Further analysis using t-SNE and attention maps confirms that the model learns robust, discriminative features rather than memorizing background noise.

Related papers

Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures [0.0]
This study investigates respiratory sound classification with a focus on mitigating pronounced class imbalance.<n>We propose a hybrid deep learning model that combines a Long Short-Term Memory (LSTM) network for sequential feature encoding with a Kolmogorov-Arnold Network (KAN) for classification.
arXiv Detail & Related papers (2026-01-07T05:37:57Z)
Noise-Robust Tiny Object Localization with Flows [63.60972031108944]
We propose a noise-robust localization framework leveraging normalizing flows for flexible error modeling and uncertainty-guided optimization.<n>Our method captures complex, non-Gaussian prediction distributions through flow-based error modeling, enabling robust learning under noisy supervision.<n>An uncertainty-aware gradient modulation mechanism further suppresses learning from high-uncertainty, noise-prone samples, mitigating overfitting while stabilizing training.
arXiv Detail & Related papers (2026-01-02T09:16:55Z)
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models [45.90037602677841]
This paper introduces a robust Anomalous Sound Detection (ASD) model that leverages audio pre-trained models. We fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy. Our experiments establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-09-11T05:19:38Z)
Electroencephalogram Emotion Recognition via AUC Maximization [0.0]
Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics.<n>This study addresses the issue class imbalance, using the Liking' label in the DEAP dataset as an example.
arXiv Detail & Related papers (2024-08-16T19:08:27Z)
STAL: Spike Threshold Adaptive Learning Encoder for Classification of Pain-Related Biosignal Data [2.0738462952016232]
This paper presents the first application of spiking neural networks (SNNs) for the classification of chronic lower back pain (CLBP) using the EmoPain dataset. We introduce Spike Threshold Adaptive Learning (STAL), a trainable encoder that effectively converts continuous biosignals into spike trains. We also propose an ensemble of Spiking Recurrent Neural Network (SRNN) classifiers for the multi-stream processing of sEMG and IMU data.
arXiv Detail & Related papers (2024-07-11T10:15:52Z)
ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models. Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z)
ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to Improve Segmentation Performance [61.04246102067351]
We propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic. We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images.
arXiv Detail & Related papers (2023-07-02T10:39:29Z)
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance. M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z)
Brain tumor grade classification Using LSTM Neural Networks with Domain Pre-Transforms [0.0]
We propose a weakly supervised imageclassification method based on combination of hand-craftedfeatures. In this study, we haveexperimented classification of brain tumor grades and achieved the state of the art performance with the resolution of 256 x 256.
arXiv Detail & Related papers (2021-06-21T07:04:52Z)
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning [2.8935588665357077]
We propose a deep CNN-RNN model that classifies respiratory sounds based on Mel-spectrograms. We also implement a patient specific model tuning strategy that first screens respiratory patients and then builds patient specific classification models. The proposed hybrid CNN-RNN model achieves a score of 66.31% on four-class classification of breathing cycles for ICBHI'17 scientific challenge respiratory sound database.
arXiv Detail & Related papers (2020-04-16T15:42:58Z)
Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production. One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs) We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.