Mask Detection and Breath Monitoring from Speech: on Data Augmentation,
Feature Representation and Modeling
- URL: http://arxiv.org/abs/2008.05175v2
- Date: Fri, 14 Aug 2020 08:44:19 GMT
- Title: Mask Detection and Breath Monitoring from Speech: on Data Augmentation,
Feature Representation and Modeling
- Authors: Haiwei Wu, Lin Zhang, Lin Yang, Xuyang Wang, Junjie Wang, Dong Zhang,
Ming Li
- Abstract summary: This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020.
For the mask detection task, we train deep convolutional neural networks with filter-bank energies, gender-aware features, and speaker-aware features.
For the speech breath monitoring task, we investigate different bottleneck features based on the Bi-LSTM structure.
- Score: 22.170442344804904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces our approaches for the Mask and Breathing Sub-Challenge
in the Interspeech COMPARE Challenge 2020. For the mask detection task, we
train deep convolutional neural networks with filter-bank energies,
gender-aware features, and speaker-aware features. Support Vector Machines
follows as the back-end classifiers for binary prediction on the extracted deep
embeddings. Several data augmentation schemes are used to increase the quantity
of training data and improve our models' robustness, including speed
perturbation, SpecAugment, and random erasing. For the speech breath monitoring
task, we investigate different bottleneck features based on the Bi-LSTM
structure. Experimental results show that our proposed methods outperform the
baselines and achieve 0.746 PCC and 78.8% UAR on the Breathing and Mask
evaluation set, respectively.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - MaskCycleGAN-based Whisper to Normal Speech Conversion [0.0]
We present a MaskCycleGAN approach for the conversion of whispered speech to normal speech.
We find that tuning the mask parameters, and pre-processing the signal with a voice activity detector provides superior performance.
arXiv Detail & Related papers (2024-08-27T06:07:18Z) - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models [89.07925369856139]
We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection.
Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage.
It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
arXiv Detail & Related papers (2023-07-27T17:56:05Z) - AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with
Masked Autoencoders [44.87786478095987]
Masked Autoencoders learn general representations for image, text, audio, video, etc., by masked input data from tokens of the visible data.
This paper proposes an adaptive masking strategy for MAEs that is end-to-end trainable.
AdaMAE samples visible tokens based on the semantic context using an auxiliary sampling network.
arXiv Detail & Related papers (2022-11-16T18:59:48Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Surgical Mask Detection with Convolutional Neural Networks and Data
Augmentations on Spectrograms [8.747840760772268]
We show the impact of data augmentation on the binary classification task of surgical mask detection in samples of human voice.
Results show that most of the baselines given by ComParE are outperformed.
arXiv Detail & Related papers (2020-08-11T09:02:47Z) - Face Anti-Spoofing with Human Material Perception [76.4844593082362]
Face anti-spoofing (FAS) plays a vital role in securing the face recognition systems from presentation attacks.
We rephrase face anti-spoofing as a material recognition problem and combine it with classical human material perception.
We propose the Bilateral Convolutional Networks (BCN), which is able to capture intrinsic material-based patterns.
arXiv Detail & Related papers (2020-07-04T18:25:53Z) - Are you wearing a mask? Improving mask detection from speech using
augmentation by cycle-consistent GANs [24.182791316595576]
We propose a novel data augmentation approach for mask detection from speech.
Our approach is based on (i) training Geneversarative Adrial Networks (GANs) with cycle-consistency loss to translate unpaired utterances.
We show that our data augmentation approach yields better results than other baseline and state-of-the-art augmentation methods.
arXiv Detail & Related papers (2020-06-17T20:46:50Z) - CNN-MoE based framework for classification of respiratory anomalies and
lung disease detection [33.45087488971683]
This paper presents and explores a robust deep learning framework for auscultation analysis.
It aims to classify anomalies in respiratory cycles and detect disease, from respiratory sound recordings.
arXiv Detail & Related papers (2020-04-04T21:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.