An Inception-Residual-Based Architecture with Multi-Objective Loss for
Detecting Respiratory Anomalies
- URL: http://arxiv.org/abs/2303.04104v2
- Date: Mon, 19 Jun 2023 21:42:15 GMT
- Title: An Inception-Residual-Based Architecture with Multi-Objective Loss for
Detecting Respiratory Anomalies
- Authors: Dat Ngo, Lam Pham, Huy Phan, Minh Tran, Delaram Jarchi, Sefki Kolozali
- Abstract summary: This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings.
Our proposed system integrates Inception-residual-based backbone models combined with multi-head attention and multi-objective loss to classify respiratory anomalies.
- Score: 10.29057783664056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a deep learning system applied for detecting anomalies
from respiratory sound recordings. Initially, our system begins with audio
feature extraction using Gammatone and Continuous Wavelet transformation. This
step aims to transform the respiratory sound input into a two-dimensional
spectrogram where both spectral and temporal features are presented. Then, our
proposed system integrates Inception-residual-based backbone models combined
with multi-head attention and multi-objective loss to classify respiratory
anomalies. Instead of applying a simple concatenation approach by combining
results from various spectrograms, we propose a Linear combination, which has
the ability to regulate equally the contribution of each individual spectrogram
throughout the training process. To evaluate the performance, we conducted
experiments over the benchmark dataset of SPRSound (The Open-Source SJTU
Paediatric Respiratory Sound) proposed by the IEEE BioCAS 2022 challenge. As
regards the Score computed by an average between the average score and harmonic
score, our proposed system gained significant improvements of 9.7%, 15.8%,
17.8%, and 16.1% in Task 1-1, Task 1-2, Task 2-1, and Task 2-2, respectively,
compared to the challenge baseline system. Notably, we achieved the Top-1
performance in Task 2-1 and Task 2-2 with the highest Score of 74.5% and 53.9%,
respectively.
Related papers
- Emotion Classification from Multi-Channel EEG Signals Using HiSTN: A Hierarchical Graph-based Spatial-Temporal Approach [0.0]
This study introduces a parameter-efficient network for emotion classification.
The network incorporates a graph hierarchy constructed from bottom-up at various abstraction levels.
It achieves mean F1 scores of 96.82% (valence) and 95.62% (arousal) in subject-dependent tests.
arXiv Detail & Related papers (2024-08-09T12:32:12Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Wav2vec-based Detection and Severity Level Classification of Dysarthria
from Speech [15.150153248025543]
The pre-trained wav2vec 2.0 model is studied as a feature extractor to build detection and severity level classification systems.
Experiments were carried out with the popularly used UA-speech database.
arXiv Detail & Related papers (2023-09-25T13:00:33Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase
Classification Using EEG [56.155331323304]
Deep learning based electroencephalogram channels' feature level fusion is carried out in this work.
Channel selection, fusion, and classification procedures were optimized by two optimization algorithms.
arXiv Detail & Related papers (2021-12-18T14:17:49Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Deep Learning Framework Applied for Predicting Anomaly of Respiratory
Sounds [11.375037967010224]
This paper proposes a robust deep learning framework used for classifying anomaly of respiratory cycles.
In this work, we conducted experiments over 2017 Internal Conference on Biomedical Health Informatics (ICBHI) benchmark dataset.
arXiv Detail & Related papers (2020-12-26T03:09:36Z) - Effects of Word-frequency based Pre- and Post- Processings for Audio
Captioning [49.41766997393417]
The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.
The system received the highest evaluation scores, but which of the individual elements most fully contributed to its perfor-mance has not yet been clarified.
arXiv Detail & Related papers (2020-09-24T01:07:33Z) - CNN-MoE based framework for classification of respiratory anomalies and
lung disease detection [33.45087488971683]
This paper presents and explores a robust deep learning framework for auscultation analysis.
It aims to classify anomalies in respiratory cycles and detect disease, from respiratory sound recordings.
arXiv Detail & Related papers (2020-04-04T21:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.