Robust and Interpretable Temporal Convolution Network for Event
Detection in Lung Sound Recordings
- URL: http://arxiv.org/abs/2106.15835v1
- Date: Wed, 30 Jun 2021 06:36:22 GMT
- Title: Robust and Interpretable Temporal Convolution Network for Event
Detection in Lung Sound Recordings
- Authors: Tharindu Fernando, Sridha Sridharan, Simon Denman, Houman
Ghaemmaghami, Clinton Fookes
- Abstract summary: We propose a lightweight, yet robust, and completely interpretable framework for lung sound event detection.
We use a multi-branch TCN architecture and exploit a novel fusion strategy to combine the resultant features from these branches.
Our analysis of different feature fusion strategies shows that the proposed feature concatenation method leads to better suppression of non-informative features.
- Score: 37.0780415938284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel framework for lung sound event detection,
segmenting continuous lung sound recordings into discrete events and performing
recognition on each event. Exploiting the lightweight nature of Temporal
Convolution Networks (TCNs) and their superior results compared to their
recurrent counterparts, we propose a lightweight, yet robust, and completely
interpretable framework for lung sound event detection. We propose the use of a
multi-branch TCN architecture and exploit a novel fusion strategy to combine
the resultant features from these branches. This not only allows the network to
retain the most salient information across different temporal granularities and
disregards irrelevant information, but also allows our network to process
recordings of arbitrary length. Results: The proposed method is evaluated on
multiple public and in-house benchmarks of irregular and noisy recordings of
the respiratory auscultation process for the identification of numerous
auscultation events including inhalation, exhalation, crackles, wheeze,
stridor, and rhonchi. We exceed the state-of-the-art results in all
evaluations. Furthermore, we empirically analyse the effect of the proposed
multi-branch TCN architecture and the feature fusion strategy and provide
quantitative and qualitative evaluations to illustrate their efficiency.
Moreover, we provide an end-to-end model interpretation pipeline that
interprets the operations of all the components of the proposed framework. Our
analysis of different feature fusion strategies shows that the proposed feature
concatenation method leads to better suppression of non-informative features,
which drastically reduces the classifier overhead resulting in a robust
lightweight network.The lightweight nature of our model allows it to be
deployed in end-user devices such as smartphones, and it has the ability to
generate predictions in real-time.
Related papers
- Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - Retrieval-Augmented Audio Deepfake Detection [27.13059118273849]
We propose a retrieval-augmented detection framework that augments test samples with similar retrieved samples for enhanced detection.
Experiments show the superior performance of the proposed RAD framework over baseline methods.
arXiv Detail & Related papers (2024-04-22T05:46:40Z) - AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts [8.809586885539002]
We propose a novel approach utilizing audio-visual multimodal data.
This method enhances audio feature extraction by leveraging Mel Frequency Cepstral Coefficients (MFCC) and Log-Mel spectrogram features alongside a pre-trained VGGish network.
Our method notably improves the accuracy of AU detection by understanding the temporal and contextual nuances of the data, showcasing significant advancements in the comprehension of intricate scenarios.
arXiv Detail & Related papers (2024-03-20T15:37:19Z) - Fuzzy Attention Neural Network to Tackle Discontinuity in Airway
Segmentation [67.19443246236048]
Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases.
Some small-sized airway branches (e.g., bronchus and terminaloles) significantly aggravate the difficulty of automatic segmentation.
This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function.
arXiv Detail & Related papers (2022-09-05T16:38:13Z) - ORF-Net: Deep Omni-supervised Rib Fracture Detection from Chest CT Scans [47.7670302148812]
radiologists need to investigate and annotate rib fractures on a slice-by-slice basis.
We propose a novel omni-supervised object detection network, which can exploit multiple different forms of annotated data.
Our proposed method outperforms other state-of-the-art approaches consistently.
arXiv Detail & Related papers (2022-07-05T07:06:57Z) - ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Finding Action Tubes with a Sparse-to-Dense Framework [62.60742627484788]
We propose a framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner.
We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets.
arXiv Detail & Related papers (2020-08-30T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.