Are you wearing a mask? Improving mask detection from speech using
augmentation by cycle-consistent GANs
- URL: http://arxiv.org/abs/2006.10147v2
- Date: Sat, 25 Jul 2020 21:52:21 GMT
- Title: Are you wearing a mask? Improving mask detection from speech using
augmentation by cycle-consistent GANs
- Authors: Nicolae-C\u{a}t\u{a}lin Ristea, Radu Tudor Ionescu
- Abstract summary: We propose a novel data augmentation approach for mask detection from speech.
Our approach is based on (i) training Geneversarative Adrial Networks (GANs) with cycle-consistency loss to translate unpaired utterances.
We show that our data augmentation approach yields better results than other baseline and state-of-the-art augmentation methods.
- Score: 24.182791316595576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of detecting whether a person wears a face mask from speech is
useful in modelling speech in forensic investigations, communication between
surgeons or people protecting themselves against infectious diseases such as
COVID-19. In this paper, we propose a novel data augmentation approach for mask
detection from speech. Our approach is based on (i) training Generative
Adversarial Networks (GANs) with cycle-consistency loss to translate unpaired
utterances between two classes (with mask and without mask), and on (ii)
generating new training utterances using the cycle-consistent GANs, assigning
opposite labels to each translated utterance. Original and translated
utterances are converted into spectrograms which are provided as input to a set
of ResNet neural networks with various depths. The networks are combined into
an ensemble through a Support Vector Machines (SVM) classifier. With this
system, we participated in the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020
Computational Paralinguistics Challenge, surpassing the baseline proposed by
the organizers by 2.8%. Our data augmentation technique provided a performance
boost of 0.9% on the private test set. Furthermore, we show that our data
augmentation approach yields better results than other baseline and
state-of-the-art augmentation methods.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - DFormer: Diffusion-guided Transformer for Universal Image Segmentation [86.73405604947459]
The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model.
At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks.
Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val 2017 set.
arXiv Detail & Related papers (2023-06-06T06:33:32Z) - SeCGAN: Parallel Conditional Generative Adversarial Networks for Face
Editing via Semantic Consistency [50.04141606856168]
We propose a label-guided cGAN for editing face images utilising semantic information without the need to specify target semantic masks.
SeCGAN has two branches of generators and discriminators operating in parallel, with one trained to translate RGB images and the other for semantic masks.
Our results on CelebA and CelebA-HQ demonstrate that our approach is able to generate facial images with more accurate attributes.
arXiv Detail & Related papers (2021-11-17T18:54:58Z) - Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency
Representation Learning [23.062034116854875]
In the absence of vaccines or medicines to stop COVID-19, one of the effective methods to slow the spread of the coronavirus is to wear a face mask.
To mandate the use of face masks or coverings in public areas, additional human resources are required, which is tedious and attention-intensive.
We propose a face mask detection framework that uses the context attention module to enable the effective attention of the feed-forward convolution neural network.
arXiv Detail & Related papers (2021-10-01T16:44:06Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Boosting Masked Face Recognition with Multi-Task ArcFace [0.973681576519524]
Given the global health crisis caused by COVID-19, mouth and nose-covering masks have become an essential everyday-clothing-accessory.
This measure has put the state-of-the-art face recognition models on the ropes since they have not been designed to work with masked faces.
A full training pipeline is presented based on the ArcFace work, with several modifications for the backbone and the loss function.
arXiv Detail & Related papers (2021-04-20T10:12:04Z) - Mask Attention Networks: Rethinking and Strengthen Transformer [70.95528238937861]
Transformer is an attention-based neural network, which consists of two sublayers, Self-Attention Network (SAN) and Feed-Forward Network (FFN)
arXiv Detail & Related papers (2021-03-25T04:07:44Z) - BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and
Positioning Predictor on Edge Devices [63.56630165340053]
Face masks offer an effective solution in healthcare for bi-directional protection against air-borne diseases.
CNNs offer an excellent solution for face recognition and classification of correct mask wearing and positioning.
CNNs can be used at entrances to corporate buildings, airports, shopping areas, and other indoor locations, to mitigate the spread of the virus.
arXiv Detail & Related papers (2021-02-06T00:14:06Z) - They are wearing a mask! Identification of Subjects Wearing a Surgical
Mask from their Speech by means of x-vectors and Fisher Vectors [0.0]
The INTERSPEECH 2020 Computational Paralinguistics Challenge offers three different problems.
This challenge involves the classification of speech recorded from subjects while wearing a surgical mask.
In this study, to address the above-mentioned problem we employ two different types of feature extraction methods.
arXiv Detail & Related papers (2020-08-23T11:27:11Z) - Mask Detection and Breath Monitoring from Speech: on Data Augmentation,
Feature Representation and Modeling [22.170442344804904]
This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020.
For the mask detection task, we train deep convolutional neural networks with filter-bank energies, gender-aware features, and speaker-aware features.
For the speech breath monitoring task, we investigate different bottleneck features based on the Bi-LSTM structure.
arXiv Detail & Related papers (2020-08-12T08:42:50Z) - Surgical Mask Detection with Convolutional Neural Networks and Data
Augmentations on Spectrograms [8.747840760772268]
We show the impact of data augmentation on the binary classification task of surgical mask detection in samples of human voice.
Results show that most of the baselines given by ComParE are outperformed.
arXiv Detail & Related papers (2020-08-11T09:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.