Interpretable Acoustic Representation Learning on Breathing and Speech
Signals for COVID-19 Detection
- URL: http://arxiv.org/abs/2206.13365v1
- Date: Mon, 27 Jun 2022 15:20:51 GMT
- Title: Interpretable Acoustic Representation Learning on Breathing and Speech
Signals for COVID-19 Detection
- Authors: Debottam Dutta, Debarpan Bhattacharya, Sriram Ganapathy, Amir H.
Poorjam, Deepak Mittal, Maneesh Singh
- Abstract summary: We describe an approach for representation learning of audio signals for the task of COVID-19 detection.
The raw audio samples are processed with a bank of 1-D convolutional filters that are parameterized as cosine modulated Gaussian functions.
The filtered outputs are pooled, log-compressed and used in a self-attention based relevance weighting mechanism.
- Score: 37.01066509527848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe an approach for representation learning of audio
signals for the task of COVID-19 detection. The raw audio samples are processed
with a bank of 1-D convolutional filters that are parameterized as cosine
modulated Gaussian functions. The choice of these kernels allows the
interpretation of the filterbanks as smooth band-pass filters. The filtered
outputs are pooled, log-compressed and used in a self-attention based relevance
weighting mechanism. The relevance weighting emphasizes the key regions of the
time-frequency decomposition that are important for the downstream task. The
subsequent layers of the model consist of a recurrent architecture and the
models are trained for a COVID-19 detection task. In our experiments on the
Coswara data set, we show that the proposed model achieves significant
performance improvements over the baseline system as well as other
representation learning approaches. Further, the approach proposed is shown to
be uniformly applicable for speech and breathing signals and for transfer
learning from a larger data set.
Related papers
- Comparative Analysis of the wav2vec 2.0 Feature Extractor [42.18541127866435]
We study the capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model.
We show that both are competitive with traditional FEs on the LibriSpeech benchmark and analyze the effect of the individual components.
arXiv Detail & Related papers (2023-08-08T14:29:35Z) - Content Adaptive Front End For Audio Signal Processing [2.8935588665357077]
We propose a learnable content adaptive front end for audio signal processing.
We pass each audio signal through a bank of convolutional filters, each giving a fixed-dimensional vector.
arXiv Detail & Related papers (2023-03-18T16:09:10Z) - Learning and controlling the source-filter representation of speech with
a variational autoencoder [23.05989605017053]
In speech processing, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors.
We propose a method to accurately and independently control the source-filter speech factors within the latent subspaces.
Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms.
arXiv Detail & Related papers (2022-04-14T16:13:06Z) - Learnable Multi-level Frequency Decomposition and Hierarchical Attention
Mechanism for Generalized Face Presentation Attack Detection [7.324459578044212]
Face presentation attack detection (PAD) is attracting a lot of attention and playing a key role in securing face recognition systems.
We propose a dual-stream convolution neural networks (CNNs) framework to deal with unseen scenarios.
We successfully prove the design of our proposed PAD solution in a step-wise ablation study.
arXiv Detail & Related papers (2021-09-16T13:06:43Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z) - iffDetector: Inference-aware Feature Filtering for Object Detection [70.8678270164057]
We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors.
IFF performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features.
IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead.
arXiv Detail & Related papers (2020-06-23T02:57:29Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.